1833870 Members
1533 Online
110063 Solutions
New Discussion

scripting question - 2

 
Soji George_1
New Member

scripting question - 2

Hi experts,
Im trying my best to make an awk program that could read a file and display only lines that are unique. The input file is a file with atleast 2000 lines some of the lines are the same. Could somebody help me on how to approach this?
8 REPLIES 8
Dietmar Konermann
Honored Contributor

Re: scripting question - 2

Hi!

I assume that you really want to do this with awk only? Could be a litte bit hard, I think.

Couldn't you just pipe through uniq or sort -u before going to awk?

Regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Simon Abbott
Frequent Advisor

Re: scripting question - 2

Hello,

Although I agree that this is probably best done with sort -u, if you want to confine it to awk you could use an array to store all the lines with $0 as the name of each element. That way, when awk goes to assign a value to the array element and that line had already been assigned, it will just be overwritten. You'll end up with unique lines.

{
myarray[$0] = $0
}

END {
for ( line in myarray ) { print line )
}

The difference will be that awk will return the lines in no particular order (it will not necisarily return them in the order it read them in).

Simon.
I'm still working on that one
Simon Abbott
Frequent Advisor

Re: scripting question - 2

Oops! that should have been a } after print line...

{
myarray[$0] = $0
}

END {
for ( line in myarray ) { print line }
}

I'm still working on that one
Leif Halvarsson_2
Honored Contributor

Re: scripting question - 2

Hi
Perhaps you should look at the command "uniq" (man uniq).
Ceesjan van Hattum
Esteemed Contributor

Re: scripting question - 2

Why not try the standard command 'uniq -u' : Print those lines that are NOT repeated in the original file.

Ofcourse it only compares adjacent lines...
You might wanna compare this with the other solutions like sort -u.

Regards,
Ceesjan
H.Merijn Brand (procura
Honored Contributor

Re: scripting question - 2

uniq only works on sorted files. If the uniqueness is to be dealt with accross the file, this perl command would do it

# perl -ne '$x{$_}++;END{for$x(keys%x){$x{$x}==1&&print$x}' infile
Enjoy, Have FUN! H.Merijn
John Wright_1
Advisor

Re: scripting question - 2

Hi,

If you want to retain the original order, try
perl -ne 'push@a,$_;$h{$_}++;END{foreach(@a)print unless$h{$_}>1}}' infile

or in awkese

awk '{a[NR]=$0;h[$0]++;}END{for(i=1;i<=NR;i++){if(h[a[i]]==1)print a[i]}}'infile

Cheers,
JW.
Robin Wakefield
Honored Contributor

Re: scripting question - 2

or even...

awk '{a[$0]++}END{for(l in a){if(a[l]==1){print l}}}' filename

Rgds, Robin