Eliminating Dups

David Bellamy · ‎10-16-2007

Hi all
HPUX 11x PARISC system.

I have a file that looks like this
John,Doe
John,Doe
John,Doe
Mary,Poppin
Mary,Poppin
Mary,Poppin

I'm writing a script in perl and I would like to know if anyone knows a method to get rid of duplicate lines.

thanks in advance for any help/suggestions.

A. Clay Stephenson · ‎10-16-2007

Have a look at the man page for the uniq command.

uniq myfile should do what you want.

If it ain't broke, I can fix that.

Duncan Edmonstone · ‎10-16-2007

David,

Does it need to be perl? COs the sort and uniq commands will take care of this very easily:

sort | uniq

HTH

Duncan

I am an HPE Employee

James R. Ferguson · ‎10-16-2007

Hi David:

In Perl, use a hash to collect the unique items.

#!/usr/bin/perl
use strict;
use warnings;
my %things;
while (<>) {
chomp;
$things{$_}++;
}
for my $key (sort keys %things) {
print "$key\n";
}
1;

Regards!

...JRF...

Steven E. Protter · ‎10-16-2007

Shalom,

sort -u

grep ^$ (to get rid of blank lines.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Ralph Grothe · ‎10-16-2007

Simply populate a hash.
One possible way:

map {chomp;$seen{$_}++ unless exists $seen{$_}} ;
@singles = keys %seen;
__DATA__
John,Doe
John,Doe
John,Doe
Mary,Poppin
Mary,Poppin
Mary,Poppin

Madness, thy name is system administration

Peter Nikitka · ‎10-16-2007

Hi,

generally its easy when your input is sorted.
The command
uniq filename

will output only different lines.

In perl, remember the last valid line and skip identical ones:

...
my $last;
while (<>) {
if ($last eq $_) { next; }
$last = $_;
print $_;
}
...

Change the print statement by your code.

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

Ralph Grothe · ‎10-16-2007

Oops, sorry for this dup.
Use James' solution...

Madness, thy name is system administration

Hein van den Heuvel · ‎10-16-2007

And printing them as they come, skipping dups could look like:

$ perl -ne 'print unless $x{$_}++'

Hein.

A. Clay Stephenson · ‎10-16-2007

... and here's a rather fully fleshed out version of an all Perl solution although it would make as much sense to run your file through the uniq command inside Perl.

------------------------------------
#!/usr/bin/perl -w

use strict;
use English;
use constant TRUE => 1;

my %exists = ();
my @uniqs = ();
my $stat = 0;
my $fname = "myfile";

my $cc = open(FH,$fname);
if (defined($cc))
{
my $s = '';
while (defined($s = ))
{
chomp($s);
unless ($exists{$s})
{
$exists{$s} = TRUE;
push(@uniqs,$s);
}
}
close(FH);
my $i = 0;
while ($i <= $#uniqs)
{
print $uniqs[$i],"\n";
++$i;
}
}
else
{
$stat = $ERRNO;
printf ("Can't open %s status %d\n",$fname,$stat);
}
exit($stat);
---------------------------------------

as the previous examples do, it uses a hash to keep up with the data that has already been read and only if the entry is the first encounter does it add to the output array.

If it ain't broke, I can fix that.

David Bellamy · ‎10-16-2007

Thanks to all, James your solution was perfect.

David Bellamy · ‎10-16-2007

Once again thanks to all

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Eliminating Dups

Eliminating Dups

Re: Eliminating Dups

Re: Eliminating Dups

Re: Eliminating Dups

Re: Eliminating Dups

Re: Eliminating Dups

Re: Eliminating Dups

Re: Eliminating Dups

Re: Eliminating Dups

Re: Eliminating Dups

Re: Eliminating Dups

Re: Eliminating Dups