1846625 Members
1908 Online
110256 Solutions
New Discussion

Re: sort issue

 
Kris_5
Occasional Advisor

sort issue

Hi folks,

I am trying to eliminate the rows which are partly duplicted. For example, see the 3 rows below(each row is wrapped into two). I need to eliminate the second row, since the third row is the updated row of the information.

coppel save_run_stats 2002-06-28:20:15:59 00:00:02.08 00:00:03.28 success
coppel save_run_stats 2002-06-28:20:19:06 xxxxxxxxxxx---------> started
coppel save_run_stats 2002-06-28:20:19:06 00:00:01.78 00:00:02.79 success

I tried the following commands and none of them are working.

sort -r -k 4 -k 2,3|sort -u
sort -k 4 -k 2,3|sort -u

Thanks in adv.

Kris
2 REPLIES 2
A. Clay Stephenson
Acclaimed Contributor

Re: sort issue

Rather than using sort for this I would use awk or perl. You apparently need to preserve the strict order of the file so that only the last version sorted on the 1st 3 fields, wins.
The problem with doing a sort like that is that sorting on a subset of keys does not force strict order as it appeared in the file.

I've attached a script which does a 'here doc' awk script and invokes awk. I made no attempt to make the script robust; e.g. if less than 3 fields are found, there is trouble. This should be very close to what you need.

Use it like this:

filter.sh < oldfile > newfile
If it ain't broke, I can fix that.
Tom Maloy
Respected Contributor

Re: sort issue

Given the time stamps, the "sort -u" may not find repetitions.

Does the extra line have a pattern that would work to delete the intermediate lines?

For example,
cat data | grep -v "started"

Tom
Carpe diem!