Operating System - Linux
1827809 Members
1990 Online
109969 Solutions
New Discussion

Re: file with duplication ignor anything where there is a duplicate.

 
SOLVED
Go to solution
rmueller58
Valued Contributor

file with duplication ignor anything where there is a duplicate.

I have a flat file with "names" in it.

See below:

aanderson
abergman
abergman
aboell
aboell
abone
abridwell
abridwell
aburks
achowdhury

for records containing duplicates I want to ignor these all together and only get the records where there is a single record..
The file is an a-z so I can't just do a simple grep ignor..
Any insight appreciated..

Rex Mueller - Unix System ESU#3
15 REPLIES 15
Peter Nikitka
Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

Hi,

since it seems, that the file is sorted, you can use 'uniq' (see man page).

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Peter Nikitka
Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

Hi,

since it seems that the file is sorted, you can use 'uniq' (see man page).

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
James R. Ferguson
Acclaimed Contributor

Re: file with duplication ignor anything where there is a duplicate.

Hi Rex:

# cat ./report
#!/usr/bin/perl
use strict;
use warnings;
my %names;
my $key;
while (<>) {
$names{$_}++;
}
for $key (sort keys %names) {
print $key if $names{$key} == 1;
}
1;

...run as:

# ./report filename

Regards!

...JRF...
Sandman!
Honored Contributor
Solution

Re: file with duplication ignor anything where there is a duplicate.

The requirement is to ignore those names that appear more than once in the input file and print only those that occur once?? If that's the case, try the awk construct below (assuming file has one column records only):

# awk '{x[$1]++}END{for(i in x) if(x[i]==1) print i}' file
James R. Ferguson
Acclaimed Contributor

Re: file with duplication ignor anything where there is a duplicate.

Hi (again) Rex:

If you prefer, the Perl script I offered can be reduced to a commandline script:

# perl -ne '$names{$_}++;END{for $key (sort keys %names) {print $key if $names{$key}==1}}' filename

Regards!

...JRF...
OldSchool
Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

perhaps something like:

sort filename | uniq > outfilename

would work for you?
rmueller58
Valued Contributor

Re: file with duplication ignor anything where there is a duplicate.

Jim,

I tried the script the names and duplicates remain.. Any ideas?

Sandman!
Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

Did you try the awk script I posted? Does the file contain mixed-case names or does it have all lowercase names?

rmueller58
Valued Contributor

Re: file with duplication ignor anything where there is a duplicate.

Sandman You DA MAN!!! I will run it past the recipient to see if this is the data they are looking for.

THANKS!! Kudos to all
Sandman!
Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

If the file has mixed-case names and you want to keep it that way, then the awk script I posted earlier will suffice. In case you want to ignore case of the names modify the awk construct as:

# awk '{x[tolower($1)]++}END{for(i in x) if(x[i]==1) print i}' file

~cheers
rmueller58
Valued Contributor

Re: file with duplication ignor anything where there is a duplicate.

It's in the awk vault.. Thanks Sandman, I can see the others are useful, I can find places for them as well.

Merry Christmas all.
spex
Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

Hi Rex,

This can be accomplished by commands alone:

$ sort file | uniq -c | grep '1 ' | cut -c6-

Merry Christmas!

PCS
rmueller58
Valued Contributor

Re: file with duplication ignor anything where there is a duplicate.

Spex, I tried that it leaves the dups in place.. Need to have none of the records that have duplicates..

James R. Ferguson
Acclaimed Contributor

Re: file with duplication ignor anything where there is a duplicate.

Hi Rex:

OK, silly me, I thought that your file contained only records with the listed fields.

Consider this file:

aanderson line-1
abergman line-2
abergman line-3
aboell line-4
aboell line-5
abone line-6
abridwell line-7
abridwell line-8
aburks line-9
achowdhury line-10

Now use:

# cat ./report
#!/usr/bin/perl
use strict;
use warnings;
my $file = shift;
my %names;
my @fields;
open (FH, "<", $file) or die "Can't open '$file': $!\n";
while () {
@fields = split;
$names{$fields[0]}++;
}
seek( FH, 0, 0);
while () {
@fields = split;
print if $names{$fields[0]} == 1;
}
1;

...thus:

# ./report file
aanderson line-1
abone line-6
aburks line-9
achowdhury line-10

...Perl counts the first field as zero whereas 'awk' would count it as one.

Regards!

...JRF...
rmueller58
Valued Contributor

Re: file with duplication ignor anything where there is a duplicate.

Thats the deal Jim! Thanks AGAIN!..