1831355 Members
3252 Online
110024 Solutions
New Discussion

check duplicate string

 
nash11
Frequent Advisor

check duplicate string

I have a file , the content is :
aaa
bbb
ccc
ddd
eee
fff
ggg
aaa


the string "aaa" is duplicate , if I want to check any other duplicate string in the file , what can I do , can anyone help me ? thx

6 REPLIES 6
A. Clay Stephenson
Acclaimed Contributor

Re: check duplicate string

sort < myfile | uniq -d

This will output any duplicate lines. Man uniq for details.
If it ain't broke, I can fix that.
Hein van den Heuvel
Honored Contributor

Re: check duplicate string

sort and uniq are the right tools, as explained.

For small files I like using perl (or awk) associative array. For example:

# perl -ne "print if $x{$_}++" x.txt

Or

# perl -ne "print if 1==$x{$_}++" x.txt

Or

perl -ne "if (1==$x{$_}++) { print "duplicate: $_" } " x.txt

Cheers,
Hein.
Peter Nikitka
Honored Contributor

Re: check duplicate string

Hi,

you can even the the number of times (double, triple, ...) , a duplicate string occurs:

sort FILE | uniq -dc

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Ninad_1
Honored Contributor

Re: check duplicate string

You can use
sort filename | uniq -d
to just list the entries which are duplicate (This will not list entries which occur only once)
sort filename | uniq or simply sort -u filename to get unique entries (This will omit the duplicate entries while displaying)
sort filename | uniq -dc | grep -v " 1 "
This will display entries which are duplicate with the count of their occurances.

Regards,
Ninad
Hein van den Heuvel
Honored Contributor

Re: check duplicate string

Hah... perl/awk can do that also:

awk '{a[$0]++} END {for (k in a){v=a[k]; if (v>1) print k, v}}' x

awk '{array[$0]++} END {for (key in array){val=array[key]; if (val>1) print key "=" val}}' x

:-)

Hein.
f. halili
Trusted Contributor

Re: check duplicate string

# sort filename | uniq -d


cheers,
f. halili
derekh