Operating System - HP-UX
1826141 Members
5044 Online
109690 Solutions
New Discussion

Re: remove duplicate entries from a file or output

 
SOLVED
Go to solution
vvsha
Frequent Advisor

remove duplicate entries from a file or output

Hi

can any one help me for the following query.

I have one file which has the entry like as follows

Mar 24 09 abcd
Mar 24 09 abcd
Mar 24 11 pqrs
Mar 24 13 wxyz
Mar 24 13 wxyz
Mar 24 22 abcd
Mar 25 01 abcd
Mar 25 02 abcd
Mar 25 03 wxyz
Mar 25 06 wxyz
Mar 25 06 wxyz
Mar 25 07 pqrs
Mar 25 07 pqrs
Mar 25 07 pqrs
Mar 25 08 pqrs

My requirement is to find out identical entry and keep only one entry and remove other duplicate entries from this file.

For example the below are identical

Mar 24 09 abcd
Mar 24 09 abcd

I want to remove one entry from this and keep one entry

another set of example

Mar 25 07 pqrs
Mar 25 07 pqrs
Mar 25 07 pqrs

I want to remove two entries from this and keep one entry

So requirement is to remove duplicate entries from the file or from an output.
Is there any HP-UX command to achieve the same?

7 REPLIES 7
Pete Randall
Outstanding Contributor
Solution

Re: remove duplicate entries from a file or output

You could try the sort command with the -u (unique) option.


Pete

Pete
Paul Sperry
Honored Contributor

Re: remove duplicate entries from a file or output

even easyer is the uniq command

#uniq file

or to make a new file

#uniq file > file.unique

vvsha
Frequent Advisor

Re: remove duplicate entries from a file or output

Hi

Can any one help me for the following query.

I have one file which has the entries like as follows

Mar 23 09 abcd
Mar 24 09 abcd
Mar 24 11 pqrs
Mar 25 13 wxyz
Mar 25 13 wxyz
Mar 26 22 abcd
Mar 26 01 abcd
Mar 26 02 abcd
Mar 27 03 wxyz
Mar 27 06 wxyz
Mar 28 06 wxyz
Mar 29 07 pqrs
Mar 29 07 pqrs

I just want to find out the values which is greater than "Mar 25"

On the basis of second column value I want to print?

output should be all the entries greater than or equal to "25"

How can we use operators here in this example?

Please help me on this query
Jonathan Fife
Honored Contributor

Re: remove duplicate entries from a file or output

For your second question, just run:
awk '$2>25' yourfile

Just know that if/when you start putting data for other months in there it gets a bit more complicated.
Decay is inherent in all compounded things. Strive on with diligence
Dennis Handly
Acclaimed Contributor

Re: remove duplicate entries from a file or output

>Paul: even easier is the uniq command

This only works if sorted or the duplicates are adjacent.

>Jonathan: Just know that if/when you start putting data for other months in there it gets a bit more complicated.

Right. You would have to map month names to numbers and compare those.

awk -v mm=Mar -v dd=25 '
BEGIN {
MON["Jan"]=1 # initialize month mapping array
MON["Feb"]=2
MON["Mar"]=3
MON["Apr"]=4
MON["May"]=5
MON["Jun"]=6
MON["Jul"]=7
MON["Aug"]=8
MON["Sep"]=9
MON["Oct"]=10
MON["Nov"]=11
MON["Dec"]=12
mon=MON[mm]
}
{
mon_in = MON[$1]
if (mon_in > mon || mon_in == mon && $2 > dd)
print $0
} ' yourfile
Arturo Galbiati
Esteemed Contributor

Re: remove duplicate entries from a file or output

Hi,
this line command will solve both requirment: sort the file removing duplicated and list file from Mar25 to the end:
sort -t" " -Mk1 -k2 tt.txt|uniq|sed -n '/Mar 25/,/$/p'

explaination:
1.
sort -t" " -Mk1 -k2 tt.txt
sorts the file using "" space as separator by month (first key) and day (second key)
Using the -M option this will run also for other month then Mar

2.
|uniq
removes duplicated

3.
sed -n '/Mar 25/,/$/p'
prints file from 25 Mar (included to the end of the file

HTH,
Art
vvsha
Frequent Advisor

Re: remove duplicate entries from a file or output

Thank you very much to all