Operating System - HP-UX
1834166 Members
2592 Online
110064 Solutions
New Discussion

Re: identify duplicates in a file

 
SOLVED
Go to solution
Anand_30
Regular Advisor

identify duplicates in a file

Hi,

I have a file which has around 500 numbers and some numbers are duplicate. Is there any way to find out which numbers have duplicate entry in the file.

Thanks,
Anand.
4 REPLIES 4
RAC_1
Honored Contributor

Re: identify duplicates in a file

Check man page of uniq.

cat your_file | uniq -d

Will give you the entries that are repeated.

(File is ASCII file)
There is no substitute to HARDWORK
Hein van den Heuvel
Honored Contributor

Re: identify duplicates in a file


Ayup uniq will do the trick.

Now if you want to do something more then just print the numbers and then you might go perl:

perl -e 'while (<>){ if (defined($x{$_})) { print } else { $x{$_}=1 }}' < yourfile

replace the 'print' to something weird or wonderful at your whim.

Hein.
Graham Cameron_1
Honored Contributor

Re: identify duplicates in a file

uniq will only work for adjacent lines.
ie, it will not find "line 3" in

line 1
line 2
line 3
line 4
line 3

I would use sort and sort -u to create 2 files, and diff to compare them.

sort file > f1
sort -u file > f2
diff f1 f2

This will show all duplicate lines, prefixed with "<".
If you want to take out the noise, use

diff f1 f2|grep "^<"|cut -c 3-

-- Graham
Computers make it easier to do a lot of things, but most of the things they make it easier to do don't need to be done.
Mark Grant
Honored Contributor
Solution

Re: identify duplicates in a file

Maybe we could just do it the simple way by combining several options above.

cat file | sort -n | uniq -d
Never preceed any demonstration with anything more predictive than "watch this"