1828959 Members
2457 Online
109986 Solutions
New Discussion

Eliminating Dups

 
SOLVED
Go to solution
David Bellamy
Respected Contributor

Eliminating Dups

Hi all
HPUX 11x PARISC system.

I have a file that looks like this
John,Doe
John,Doe
John,Doe
Mary,Poppin
Mary,Poppin
Mary,Poppin

I'm writing a script in perl and I would like to know if anyone knows a method to get rid of duplicate lines.

thanks in advance for any help/suggestions.
11 REPLIES 11
A. Clay Stephenson
Acclaimed Contributor

Re: Eliminating Dups

Have a look at the man page for the uniq command.

uniq myfile should do what you want.
If it ain't broke, I can fix that.
Solution

Re: Eliminating Dups

David,

Does it need to be perl? COs the sort and uniq commands will take care of this very easily:

sort | uniq

HTH

Duncan

I am an HPE Employee
Accept or Kudo
James R. Ferguson
Acclaimed Contributor

Re: Eliminating Dups

Hi David:

In Perl, use a hash to collect the unique items.

#!/usr/bin/perl
use strict;
use warnings;
my %things;
while (<>) {
chomp;
$things{$_}++;
}
for my $key (sort keys %things) {
print "$key\n";
}
1;

Regards!

...JRF...
Steven E. Protter
Exalted Contributor

Re: Eliminating Dups

Shalom,

sort -u

grep ^$ (to get rid of blank lines.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ralph Grothe
Honored Contributor

Re: Eliminating Dups

Simply populate a hash.
One possible way:

map {chomp;$seen{$_}++ unless exists $seen{$_}} ;
@singles = keys %seen;
__DATA__
John,Doe
John,Doe
John,Doe
Mary,Poppin
Mary,Poppin
Mary,Poppin
Madness, thy name is system administration
Peter Nikitka
Honored Contributor

Re: Eliminating Dups

Hi,

generally its easy when your input is sorted.
The command
uniq filename

will output only different lines.

In perl, remember the last valid line and skip identical ones:

...
my $last;
while (<>) {
if ($last eq $_) { next; }
$last = $_;
print $_;
}
...

Change the print statement by your code.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Ralph Grothe
Honored Contributor

Re: Eliminating Dups

Oops, sorry for this dup.
Use James' solution...
Madness, thy name is system administration
Hein van den Heuvel
Honored Contributor

Re: Eliminating Dups



And printing them as they come, skipping dups could look like:

$ perl -ne 'print unless $x{$_}++'

Hein.

A. Clay Stephenson
Acclaimed Contributor

Re: Eliminating Dups

... and here's a rather fully fleshed out version of an all Perl solution although it would make as much sense to run your file through the uniq command inside Perl.

------------------------------------
#!/usr/bin/perl -w

use strict;
use English;
use constant TRUE => 1;

my %exists = ();
my @uniqs = ();
my $stat = 0;
my $fname = "myfile";

my $cc = open(FH,$fname);
if (defined($cc))
{
my $s = '';
while (defined($s = ))
{
chomp($s);
unless ($exists{$s})
{
$exists{$s} = TRUE;
push(@uniqs,$s);
}
}
close(FH);
my $i = 0;
while ($i <= $#uniqs)
{
print $uniqs[$i],"\n";
++$i;
}
}
else
{
$stat = $ERRNO;
printf ("Can't open %s status %d\n",$fname,$stat);
}
exit($stat);
---------------------------------------

as the previous examples do, it uses a hash to keep up with the data that has already been read and only if the entry is the first encounter does it add to the output array.



If it ain't broke, I can fix that.
David Bellamy
Respected Contributor

Re: Eliminating Dups

Thanks to all, James your solution was perfect.
David Bellamy
Respected Contributor

Re: Eliminating Dups

Once again thanks to all