1845486 Members
2998 Online
110244 Solutions
New Discussion

sort & uniq

 
Ravinder Singh Gill
Regular Advisor

sort & uniq

in a script I have the following:

while read fakeuser
do
grep $fakeuser passgvts >> queryusers
done < testnotokusers

However I have a lot of repeats in the file queryusers due to having two same entries in testnotokusers i.e. gillar and having two very similar entries in passgvts i.e. gillars & gillarst. Hence for each entry I end up with it being outputted to queryusers twice as it is being grepped twice i.e. the queryusers file may look like:

gillars
gillarst
gillars
gillarts

As there are hundreds of entries I can not go through the whole file deleting repetitions. It has been suggested that I do something like the following at the bottom of the script

cat queryusers ¦ sort ¦ uniq > newfile

hence this would sort the queryusers file and send unique entries to the newfile. Can someone please tell me what options I need for "sort" and "uniq" if any. Thanks
3 REPLIES 3
Bill Hassell
Honored Contributor

Re: sort & uniq

If the queryusers file has the name as the first entry on the line, then |sort|uniq will reduce all occurances of the same name to just one. If the problem is that you are looking for an exact word (ie, gillar) and do not want anything else (like gillars and gillarst), use the -w option in grep. This option is only available with the latest patch for grep.


Bill Hassell, sysadmin
Hein van den Heuvel
Honored Contributor

Re: sort & uniq


ah... a continuation/parallel thread to:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=980856

As bill says... you may want to grep explicitly for a word, or use an egrep anchoring your words at begin of line, following them with whitepace, or anything else to make it do what you want exactly.

Now if you do want to post-process that queryusers files, and not simply avoid the dups as per the other topic, then just can just tell uniq to look only for the first N characters:

sort queryusers | uniq -w 6

but then why not go one step further and jsut tell srot exatly what you want:

Something like:

sort -k 1.1,1.6 -u queryusers


caveat... I did not have access to an hpux box just now, so I only tested on my CD player which runs redhat linux 2.4

Hein.


Arturo Galbiati
Esteemed Contributor

Re: sort & uniq

Hi Ravinder,
sort -uo newfile queryusers
should be sufficent to remove dusplicates from queryusers if this file contains only the name of teh users otherwise is necessary to provide the key for the sort in teh format
-k to teh sort.
You can type man sort to look at this.
HTH,
Art