topic Re: Eliminating Dups in Operating System - Linux

Eliminating Dups

David Bellamy — Tue, 16 Oct 2007 11:40:25 GMT

Hi all
HPUX 11x PARISC system.

I have a file that looks like this
John,Doe
John,Doe
John,Doe
Mary,Poppin
Mary,Poppin
Mary,Poppin

I'm writing a script in perl and I would like to know if anyone knows a method to get rid of duplicate lines.

thanks in advance for any help/suggestions.

Re: Eliminating Dups

A. Clay Stephenson — Tue, 16 Oct 2007 11:55:50 GMT

Have a look at the man page for the uniq command.

uniq myfile should do what you want.

Re: Eliminating Dups

Duncan Edmonstone — Tue, 16 Oct 2007 11:57:27 GMT

David,

Does it need to be perl? COs the sort and uniq commands will take care of this very easily:

sort | uniq

HTH

Duncan

Re: Eliminating Dups

James R. Ferguson — Tue, 16 Oct 2007 12:05:11 GMT

Hi David:

In Perl, use a hash to collect the unique items.

#!/usr/bin/perl
use strict;
use warnings;
my %things;
while (<>) {
chomp;
$things{$_}++;
}
for my $key (sort keys %things) {
print "$key\n";
}
1;

Regards!

...JRF...

Re: Eliminating Dups

Steven E. Protter — Tue, 16 Oct 2007 12:05:37 GMT

Shalom,

sort -u

grep ^$ (to get rid of blank lines.

SEP

Re: Eliminating Dups

Ralph Grothe — Tue, 16 Oct 2007 12:13:39 GMT

Simply populate a hash.
One possible way:

map {chomp;$seen{$_}++ unless exists $seen{$_}} ;
@singles = keys %seen;
__DATA__
John,Doe
John,Doe
John,Doe
Mary,Poppin
Mary,Poppin
Mary,Poppin

Re: Eliminating Dups

Peter Nikitka — Tue, 16 Oct 2007 12:14:56 GMT

Hi,

generally its easy when your input is sorted.
The command
uniq filename

will output only different lines.

In perl, remember the last valid line and skip identical ones:

...
my $last;
while (<>) {
if ($last eq $_) { next; }
$last = $_;
print $_;
}
...

Change the print statement by your code.

mfG Peter

Re: Eliminating Dups

Ralph Grothe — Tue, 16 Oct 2007 12:15:03 GMT

Oops, sorry for this dup.
Use James' solution...

Re: Eliminating Dups

Hein van den Heuvel — Tue, 16 Oct 2007 12:24:36 GMT

And printing them as they come, skipping dups could look like:

$ perl -ne 'print unless $x{$_}++'

Hein.

Re: Eliminating Dups

A. Clay Stephenson — Tue, 16 Oct 2007 12:28:54 GMT

... and here's a rather fully fleshed out version of an all Perl solution although it would make as much sense to run your file through the uniq command inside Perl.

------------------------------------
#!/usr/bin/perl -w

use strict;
use English;
use constant TRUE => 1;

my %exists = ();
my @uniqs = ();
my $stat = 0;
my $fname = "myfile";

my $cc = open(FH,$fname);
if (defined($cc))
{
my $s = '';
while (defined($s = ))
{
chomp($s);
unless ($exists{$s})
{
$exists{$s} = TRUE;
push(@uniqs,$s);
}
}
close(FH);
my $i = 0;
while ($i <= $#uniqs)
{
print $uniqs[$i],"\n";
++$i;
}
}
else
{
$stat = $ERRNO;
printf ("Can't open %s status %d\n",$fname,$stat);
}
exit($stat);
---------------------------------------

as the previous examples do, it uses a hash to keep up with the data that has already been read and only if the entry is the first encounter does it add to the output array.

Re: Eliminating Dups

David Bellamy — Tue, 16 Oct 2007 12:35:10 GMT

Thanks to all, James your solution was perfect.

Re: Eliminating Dups

David Bellamy — Tue, 16 Oct 2007 12:36:10 GMT

Once again thanks to all