Languages and Scripting
Showing results for 
Search instead for 
Do you mean 

file with duplication ignor anything where there is a duplicate.

SOLVED
Go to Solution
Valued Contributor Valued Contributor

file with duplication ignor anything where there is a duplicate.

I have a flat file with "names" in it.

See below:

aanderson
abergman
abergman
aboell
aboell
abone
abridwell
abridwell
aburks
achowdhury

for records containing duplicates I want to ignor these all together and only get the records where there is a single record..
The file is an a-z so I can't just do a simple grep ignor..
Any insight appreciated..

Rex Mueller - Unix System ESU#3
15 REPLIES
Honored Contributor Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

Hi,

since it seems, that the file is sorted, you can use 'uniq' (see man page).

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Honored Contributor Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

Hi,

since it seems that the file is sorted, you can use 'uniq' (see man page).

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Acclaimed Contributor Acclaimed Contributor

Re: file with duplication ignor anything where there is a duplicate.

Hi Rex:

# cat ./report
#!/usr/bin/perl
use strict;
use warnings;
my %names;
my $key;
while (<>) {
$names{$_}++;
}
for $key (sort keys %names) {
print $key if $names{$key} == 1;
}
1;

...run as:

# ./report filename

Regards!

...JRF...
Highlighted
Honored Contributor Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

The requirement is to ignore those names that appear more than once in the input file and print only those that occur once?? If that's the case, try the awk construct below (assuming file has one column records only):

# awk '{x[$1]++}END{for(i in x) if(x[i]==1) print i}' file
Acclaimed Contributor Acclaimed Contributor

Re: file with duplication ignor anything where there is a duplicate.

Hi (again) Rex:

If you prefer, the Perl script I offered can be reduced to a commandline script:

# perl -ne '$names{$_}++;END{for $key (sort keys %names) {print $key if $names{$key}==1}}' filename

Regards!

...JRF...
Honored Contributor Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

perhaps something like:

sort filename | uniq > outfilename

would work for you?
Valued Contributor Valued Contributor

Re: file with duplication ignor anything where there is a duplicate.

Jim,

I tried the script the names and duplicates remain.. Any ideas?

Honored Contributor Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

Did you try the awk script I posted? Does the file contain mixed-case names or does it have all lowercase names?

Valued Contributor Valued Contributor

Re: file with duplication ignor anything where there is a duplicate.

Sandman You DA MAN!!! I will run it past the recipient to see if this is the data they are looking for.

THANKS!! Kudos to all
Honored Contributor Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

If the file has mixed-case names and you want to keep it that way, then the awk script I posted earlier will suffice. In case you want to ignore case of the names modify the awk construct as:

# awk '{x[tolower($1)]++}END{for(i in x) if(x[i]==1) print i}' file

~cheers
Valued Contributor Valued Contributor

Re: file with duplication ignor anything where there is a duplicate.

It's in the awk vault.. Thanks Sandman, I can see the others are useful, I can find places for them as well.

Merry Christmas all.
Honored Contributor

Re: file with duplication ignor anything where there is a duplicate.

Hi Rex,

This can be accomplished by commands alone:

$ sort file | uniq -c | grep '1 ' | cut -c6-

Merry Christmas!

PCS
Valued Contributor Valued Contributor

Re: file with duplication ignor anything where there is a duplicate.

Spex, I tried that it leaves the dups in place.. Need to have none of the records that have duplicates..

Acclaimed Contributor Acclaimed Contributor

Re: file with duplication ignor anything where there is a duplicate.

Hi Rex:

OK, silly me, I thought that your file contained only records with the listed fields.

Consider this file:

aanderson line-1
abergman line-2
abergman line-3
aboell line-4
aboell line-5
abone line-6
abridwell line-7
abridwell line-8
aburks line-9
achowdhury line-10

Now use:

# cat ./report
#!/usr/bin/perl
use strict;
use warnings;
my $file = shift;
my %names;
my @fields;
open (FH, "<", $file) or die "Can't open '$file': $!\n";
while () {
@fields = split;
$names{$fields[0]}++;
}
seek( FH, 0, 0);
while () {
@fields = split;
print if $names{$fields[0]} == 1;
}
1;

...thus:

# ./report file
aanderson line-1
abone line-6
aburks line-9
achowdhury line-10

...Perl counts the first field as zero whereas 'awk' would count it as one.

Regards!

...JRF...
Valued Contributor Valued Contributor

Re: file with duplication ignor anything where there is a duplicate.

Thats the deal Jim! Thanks AGAIN!..