Re: file with duplication ignor anything where there is a duplicate.

rmueller58 · ‎12-20-2006

I have a flat file with "names" in it.

See below:

aanderson
abergman
abergman
aboell
aboell
abone
abridwell
abridwell
aburks
achowdhury

for records containing duplicates I want to ignor these all together and only get the records where there is a single record..
The file is an a-z so I can't just do a simple grep ignor..
Any insight appreciated..

Rex Mueller - Unix System ESU#3

Peter Nikitka · ‎12-20-2006

Hi,

since it seems, that the file is sorted, you can use 'uniq' (see man page).

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

Peter Nikitka · ‎12-20-2006

Hi,

since it seems that the file is sorted, you can use 'uniq' (see man page).

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

James R. Ferguson · ‎12-20-2006

Hi Rex:

# cat ./report
#!/usr/bin/perl
use strict;
use warnings;
my %names;
my $key;
while (<>) {
$names{$_}++;
}
for $key (sort keys %names) {
print $key if $names{$key} == 1;
}
1;

...run as:

# ./report filename

Regards!

...JRF...

Sandman! · ‎12-20-2006

The requirement is to ignore those names that appear more than once in the input file and print only those that occur once?? If that's the case, try the awk construct below (assuming file has one column records only):

# awk '{x[$1]++}END{for(i in x) if(x[i]==1) print i}' file

James R. Ferguson · ‎12-20-2006

Hi (again) Rex:

If you prefer, the Perl script I offered can be reduced to a commandline script:

# perl -ne '$names{$_}++;END{for $key (sort keys %names) {print $key if $names{$key}==1}}' filename

Regards!

...JRF...

OldSchool · ‎12-20-2006

perhaps something like:

sort filename | uniq > outfilename

would work for you?

rmueller58 · ‎12-20-2006

Jim,

I tried the script the names and duplicates remain.. Any ideas?

Sandman! · ‎12-20-2006

Did you try the awk script I posted? Does the file contain mixed-case names or does it have all lowercase names?

rmueller58 · ‎12-20-2006

Sandman You DA MAN!!! I will run it past the recipient to see if this is the data they are looking for.

THANKS!! Kudos to all

Sandman! · ‎12-20-2006

If the file has mixed-case names and you want to keep it that way, then the awk script I posted earlier will suffice. In case you want to ignore case of the names modify the awk construct as:

# awk '{x[tolower($1)]++}END{for(i in x) if(x[i]==1) print i}' file

~cheers

rmueller58 · ‎12-20-2006

It's in the awk vault.. Thanks Sandman, I can see the others are useful, I can find places for them as well.

Merry Christmas all.

spex · ‎12-20-2006

Hi Rex,

This can be accomplished by commands alone:

$ sort file | uniq -c | grep '1 ' | cut -c6-

Merry Christmas!

PCS

rmueller58 · ‎12-20-2006

Spex, I tried that it leaves the dups in place.. Need to have none of the records that have duplicates..

James R. Ferguson · ‎12-20-2006

Hi Rex:

OK, silly me, I thought that your file contained only records with the listed fields.

Consider this file:

aanderson line-1
abergman line-2
abergman line-3
aboell line-4
aboell line-5
abone line-6
abridwell line-7
abridwell line-8
aburks line-9
achowdhury line-10

Now use:

# cat ./report
#!/usr/bin/perl
use strict;
use warnings;
my $file = shift;
my %names;
my @fields;
open (FH, "<", $file) or die "Can't open '$file': $!\n";
while () {
@fields = split;
$names{$fields[0]}++;
}
seek( FH, 0, 0);
while () {
@fields = split;
print if $names{$fields[0]} == 1;
}
1;

...thus:

# ./report file
aanderson line-1
abone line-6
aburks line-9
achowdhury line-10

...Perl counts the first field as zero whereas 'awk' would count it as one.

Regards!

...JRF...

rmueller58 · ‎12-20-2006

Thats the deal Jim! Thanks AGAIN!..

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: file with duplication ignor anything where there is a duplicate.

file with duplication ignor anything where there is a duplicate.