Re: remove duplicate entries from a file or output

vvsha · ‎03-25-2008

Hi

can any one help me for the following query.

I have one file which has the entry like as follows

Mar 24 09 abcd
Mar 24 09 abcd
Mar 24 11 pqrs
Mar 24 13 wxyz
Mar 24 13 wxyz
Mar 24 22 abcd
Mar 25 01 abcd
Mar 25 02 abcd
Mar 25 03 wxyz
Mar 25 06 wxyz
Mar 25 06 wxyz
Mar 25 07 pqrs
Mar 25 07 pqrs
Mar 25 07 pqrs
Mar 25 08 pqrs

My requirement is to find out identical entry and keep only one entry and remove other duplicate entries from this file.

For example the below are identical

Mar 24 09 abcd
Mar 24 09 abcd

I want to remove one entry from this and keep one entry

another set of example

Mar 25 07 pqrs
Mar 25 07 pqrs
Mar 25 07 pqrs

I want to remove two entries from this and keep one entry

So requirement is to remove duplicate entries from the file or from an output.
Is there any HP-UX command to achieve the same?

Pete Randall · ‎03-25-2008

You could try the sort command with the -u (unique) option.

Pete

Pete

Paul Sperry · ‎03-25-2008

even easyer is the uniq command

#uniq file

or to make a new file

#uniq file > file.unique

vvsha · ‎03-25-2008

Hi

Can any one help me for the following query.

I have one file which has the entries like as follows

Mar 23 09 abcd
Mar 24 09 abcd
Mar 24 11 pqrs
Mar 25 13 wxyz
Mar 25 13 wxyz
Mar 26 22 abcd
Mar 26 01 abcd
Mar 26 02 abcd
Mar 27 03 wxyz
Mar 27 06 wxyz
Mar 28 06 wxyz
Mar 29 07 pqrs
Mar 29 07 pqrs

I just want to find out the values which is greater than "Mar 25"

On the basis of second column value I want to print?

output should be all the entries greater than or equal to "25"

How can we use operators here in this example?

Please help me on this query

Jonathan Fife · ‎03-25-2008

For your second question, just run:
awk '$2>25' yourfile

Just know that if/when you start putting data for other months in there it gets a bit more complicated.

Decay is inherent in all compounded things. Strive on with diligence

Dennis Handly · ‎03-25-2008

>Paul: even easier is the uniq command

This only works if sorted or the duplicates are adjacent.

>Jonathan: Just know that if/when you start putting data for other months in there it gets a bit more complicated.

Right. You would have to map month names to numbers and compare those.

awk -v mm=Mar -v dd=25 '
BEGIN {
MON["Jan"]=1 # initialize month mapping array
MON["Feb"]=2
MON["Mar"]=3
MON["Apr"]=4
MON["May"]=5
MON["Jun"]=6
MON["Jul"]=7
MON["Aug"]=8
MON["Sep"]=9
MON["Oct"]=10
MON["Nov"]=11
MON["Dec"]=12
mon=MON[mm]
}
{
mon_in = MON[$1]
if (mon_in > mon || mon_in == mon && $2 > dd)
print $0
} ' yourfile

Arturo Galbiati · ‎03-26-2008

Hi,
this line command will solve both requirment: sort the file removing duplicated and list file from Mar25 to the end:
sort -t" " -Mk1 -k2 tt.txt|uniq|sed -n '/Mar 25/,/$/p'

explaination:
1.
sort -t" " -Mk1 -k2 tt.txt
sorts the file using "" space as separator by month (first key) and day (second key)
Using the -M option this will run also for other month then Mar

2.
|uniq
removes duplicated

3.
sed -n '/Mar 25/,/$/p'
prints file from 25 Mar (included to the end of the file

HTH,
Art

vvsha · ‎03-26-2008

Thank you very much to all

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: remove duplicate entries from a file or output

remove duplicate entries from a file or output