1832998 Members
2229 Online
110048 Solutions
New Discussion

sort command

 
Babak_4
Advisor

sort command

Hi,

here is my file's content:
$ cat gg
9|4|1|1118|-7.79|607|K7K|6B3|3.309
9|4|1|1117|7.49|724|N1K|1R7|4.981

and here is my sort:
$ sort -n -t'|' -k 1,5 -k 9 gg
9|4|1|1118|-7.79|607|K7K|6B3|3.309
9|4|1|1117|7.49|724|N1K|1R7|4.981

As you see the second line should come first, since its fourth field is less than the first
record's.

But when I remove -k 9 it works fine:
$ sort -n -t'|' -k 1,5 gg
9|4|1|1117|7.49|724|N1K|1R7|4.981
9|4|1|1118|-7.79|607|K7K|6B3|3.309

Am I doing a silly mistake here? or ...

I appreciate your help.

Babak

25 REPLIES 25
Bharat Katkar
Honored Contributor

Re: sort command

Babak,
from man of "sort"
" When there are multiple sort keys, later keys are compared only after all earlier keys compare equal.
Lines that otherwise compare equal are ordered with all bytes significant.
If all the specified keys compare equal, the entire record is used as the final key. "

Hope that addresses your issue.
Regards,

You need to know a lot to actually know how little you know
Babak_4
Advisor

Re: sort command

Hi Bharat,

thanks for it, but still I'm confeused. the first key -k 1,5 says sort on the first five fields, so the second record should come first,(as it does when -k 9 is removed), or mein understanding is wrong?

Babak
Bharat Katkar
Honored Contributor

Re: sort command

Hi Babak,
Compare following two Outputs:

1. $ sort -n -t'|' -k 1,9 -k 9 gg
2. $ sort -n -t'|' -k 1,9 gg

I hope both should be same.

Regards,

p.s. I have counted the PIPE also.
You need to know a lot to actually know how little you know
Babak_4
Advisor

Re: sort command

Sorry, I can't get your point. Anyway I tried yours, and they are not same!
Something that I found is if I remove -n from "sort -n -t'|' -k 1,5 -k 9 gg" it works!

Regards,

Babak
Babak_4
Advisor

Re: sort command

I guess sort has a problem with multiple -k. Is it a known issue and is there any patch for it?

Babak
Muthukumar_5
Honored Contributor

Re: sort command

Hai,

Bharath gave the key point from sort man page. If we are using multiple key value on the sort command, it will process key 1 and to key n.

>>>>>>>>>>>>>>>>>>>>>>>>>>>
When there are
multiple sort keys, later keys are compared only
after all earlier keys compare equal. Lines that
otherwise compare equal are ordered with all bytes
significant


>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

From your example,
sort -n -t'|' -k 1,5 -k 9 gg

It will be processed as like,

sort -n -t'|' -k 1,5 gg | sort -n -t'|' -k 9

The output is also descibing that,
9|4|1|1118|-7.79|607|K7K|6B3|3.309
9|4|1|1117|7.49|724|N1K|1R7|4.981

Look at the ninth field, of 3.309 and 4.981 they are sorted as your requirement.

The contents of your example strings are sorted with key 1,5 fields in the begin and with key 9 in the end.

The key's we are using to sort must be significant.

If you test the example as like,

sort -n -t'|' -k 9 -k 1,5 gg

The above details will be changed,
It must be
9|4|1|1117|7.49|724|N1K|1R7|4.981
9|4|1|1118|-7.79|607|K7K|6B3|3.309

It will not come. sort call is working as stated in the man page.

Regards,
Muthukumar.

Easy to suggest when don't know about the problem!
Babak_4
Advisor

Re: sort command

Hi,

please bear me for more questions:

But what about the fourth field? The second record has less value (1117) than the first record (1118). The fourth field should be considered before the ninth one.

Doesn't "sort -n -t'|' -k 1,5 -k 9 gg" mean that first 5 fields should be considered first and only if they are equal then consider the ninth field?

"sort -n -t'|' -k 1,5 gg | sort -n -t'|' -k 9" is not and shouldn't be as same as "sort -n -t'|' -k 1,5 -k 9 gg"

my intention of "sort -n -t'|' -k 1,5 -k 9 gg" was to sort the file based on the combination of six fields: 1 to 5 and 9.

Please correct me if I'm misunderstanding the man page.

Thanks,

Babak
Rodney Hills
Honored Contributor

Re: sort command

If you sort on -k 4,5 -k 9 then it works.

If you change your data line #2 position 1 to an "8", then the sort works.

If you seperate the keys out to -k 1,1 -k 2,2 -k 3,3 -k 4,4 -k 5,5 -k 9,9, then it works.

It is acting like the key "range" -k 1,5 is thinks it is a duplicate key field when only the first field position is duplicated.

It looks like a bug to me...

-- Rod Hills
There be dragons...
Babak_4
Advisor

Re: sort command

It would be very helpful is someone from HP confirm this.

Thanks,

Babak
Fred Ruffet
Honored Contributor

Re: sort command

I have here syncsort as a replacement for sort on HP server (faster, better, more expensive :).

I have exactly the same sort with this second program. I can't imagine the similar bug on two so used programs.

What I understand in man is that specifying multiple keys of multiple fields will compare second fields of each keys only if first fileds of each keys are equal. And so on for second fields, third...

An exapmle for I say with your example :
add a -k 4 in your sort, and it will do.

regards,

Fred

--

"Reality is just a point of view." (P. K. D.)
John Kittel
Trusted Contributor

Re: sort command

I also tried it with a linux system and a true-64 system, and it works behaves exactly the same way on them. So maybe it is not a bug. I too am interested in hearing from some authoritative source if it IS a bug or not.
Fred Ruffet
Honored Contributor

Re: sort command

On a Sun OS 5.6,

with sort :
$/usr/bin/sort -n -t'|' -k 1,5 -k 9 sort_file
9|4|1|1118|-7.79|607|K7K|6B3|3.309
9|4|1|1117|7.49|724|N1K|1R7|4.981

with syncsort :
$sort -n -t'|' -k 1,5 -k 9 sort_file 9|4|1|1118|-7.79|607|K7K|6B3|3.309
9|4|1|1117|7.49|724|N1K|1R7|4.981

It acts the same again. Confort me in my opinion : it's not a bug, it's a feature :)

Regards,

Fred
--

"Reality is just a point of view." (P. K. D.)
Tim D Fulford
Honored Contributor

Re: sort command

It look weird, but you will ned to EXPLICITY state the order one by one It look like the implied 1,5 does not wotk!!

Try
sort -n -t'|' -k 1,5 -k 9 gg
is the same as
sort -n -t'|' -k 9 -k 1,5 gg

what you want is
sort -n -t'|' -k1 -k2 -k3 -k4 -k5 -k9 gg
will be different to
sort -n -t'|' -k9 -k5 -k4 -k3 -k2 -k1 gg

regards

Tim
-
Babak_4
Advisor

Re: sort command

Fred,

thanks for response. But I tried it with syncsort, it works fine. Can you post your syntax? I want to be sure that syncsort does not behave like unix sort.

Regarding adding -k 4, as Rod said it works if you specify each field separately, but it's not the way that man page says:

...
Restricted Sort Key
-k keydef The keydef argument defines a restricted sort key.The format of this definition is
field_start[type][,field_end[type]]
which defines a key field beginning at field_start and ending at field_end.
...
Multiple -k options are permitted and are significant in command line order. A maximum of 9 -k options can be given.
...

Besides, we've got restriction on number of -k (max. 9).

Thanks,

Babak


Fred Ruffet
Honored Contributor

Re: sort command

I insist :
Problem is not that 1,5 is not working. Sort is done on first field of each key (1 and 9) then on second field of each key (only 2 because 2nd key do not have other field) and so on. So the sort you ask is equivalent to
-k 1 -k 9 -k 2 -k 3 -k 4 -k 5
whereas what you want is
-k 1,5 -k 4 -k 9

Regards,

Fred
--

"Reality is just a point of view." (P. K. D.)
Fred Ruffet
Honored Contributor

Re: sort command

Babak,

Didn't saw your answer before posting...
OK, but note that, as somebody extract it before from man :
"When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. Lines that otherwise compare equal are ordered with all bytes significant"

syncsort syntax is in my precedent post, the same as sort. I am using sort shell script in syncsort bin directory, and not directly syncsort program.

Regards,

Fred
--

"Reality is just a point of view." (P. K. D.)
Babak_4
Advisor

Re: sort command

Fred,

let me emphasis that "-k 1,5" means "-k 1 -k 2 -k 3 -4 and -k 5". As I mentioned in my earlier reply if you remove -k 9 from my command it works, meaning the value of the fourth field is considered, whereas with -k 9 it doesn't.

Thanks,

Babak


Rodney Hills
Honored Contributor

Re: sort command

Babak,

I read from "man sort" that-

sort -k 1,5
and
sort -k 1,1 -k 2,2 -k 3,3 -k 4,4 -k 5,5

should be equivalent, but in my previous tests it was not...

Even though it is behaving the same on a Linux system, I would say that HPUX and Linux might share the same source code for a number of utilities, and thus share some of the same bugs.

Whether you call this a bug or an undocumented feature, it still is frustrating when a command doesn't do what you expect it to do.

my 2 cents.

-- Rod Hills
There be dragons...
Fred Ruffet
Honored Contributor

Re: sort command

IMHO "-k 1,5" is not equivalent to "-k 1 -k 2 -k 3 -k 4 -k 5".

The first one sorts on a key wich is a concatenation of the first 5 fields. For the second, the 5 first fields have the same importance.

That's why I say that in your original sort, 1 and 9 are sorted first, then 2, then 3...

...and that's why removing 9 works the way you think it (1 2 3 4 and 5 are already distincts).

Regards,

Fred
--

"Reality is just a point of view." (P. K. D.)
John Kittel
Trusted Contributor

Re: sort command

I (finally) have to agree with you Fred. It took me quite a bit of playing with various sets of sample data, thinking about what you said, etc. to get over my false preconception, but it makes perfect sense now. The man page clearly says -k 1,5 is treated as a concatenation of fields, and understanding the effect of that is the key point. But in addition, I think the conclusion is also supported by the fact that all the different platforms and sort programs display the same behavior and it would be unlikely first that they would all have the same bug, but also that if it is a bug in all those different versions that it would not have been reported and/or fixed by now.

My final set of test data is what helped me see it. Here it is:

# cat mydata
1 2 1 1
1 2 2 2
1 2 1 2
1 2 2 1
1 1 2 2
1 1 2 1
1 1 3 1
1 1 3 2
1 2 3 2
1 2 3 1

# sort -n -k 1,3 -k 4 mydata
1 1 2 1
1 1 3 1
1 2 1 1
1 2 2 1
1 2 3 1
1 1 2 2
1 1 3 2
1 2 1 2
1 2 2 2
1 2 3 2

In the sorted file all records where field 4 = 1 come before all records where field 4 = 2. Then within the set of all records where field 4 = 1, fields 2 and 3 are sorted properly. And within the set of all records where field 4 = 2, fields 2 and 3 are sorted properly.
Fred Ruffet
Honored Contributor

Re: sort command

thanks John ! I finally feel less lonely on my island :)

You manage to find a really helpful example (better than the two lines we are dealing with from the start)

Regards,

Fred
--

"Reality is just a point of view." (P. K. D.)
Babak_4
Advisor

Re: sort command

John,
thanks for reply. I have to say that your finding is exactly what I faced at beginning and posted a question at this forum. You explained how sort works, but the question is: Is that exactly what is said in the man page?

Please clear me about this from sort man page (Please note the upper case parts):

Multiple -k options are permitted and ARE SIGNIFICANT IN COMMAND LINE ORDER. A maximum of 9 -k options can be given. If no -k option is specified, a default sort key of the entire line is used. WHEN THERE ARE MULTIPLE SORT KEYS, LATER KEYS ARE COMPARED ONLY AFTER ALL EARLIER KEYS COMP
ARE EQUAL. Lines that otherwise compare equal are ordered with all bytes significant. If all the specified keys compare equal, the entire recor
d is used as the final key.

Let me ask you a question:
Using myfile, these two produce same result.
1) -k1,2
2) -k1,1 -k2,2

Now suppose you want to add another field ,let say the fourth one, as a key. Now these two do not produce the same result. In fact case 3 produces wrong order which surprisingly is same as -k4,4 ! (I would like to remind you about my high lights from man page: LATER KEYS ARE COMPARED ONLY AFTER ALL EARLIER KEYS COMPARE EQUAL)

3) -k1,2 -k4,4
4) -k1,1 -k2,2 -k4,4

So, you have to mention all your keys explicitly, which is again doesn't match with the man page.

At the end I want to say that I don't have any problem with the sort behavior, but as I said, in my opinion there is a difference between the man page and the actual behavior of sort. What I was trying to get from you guys was correct understanding of man page, unfortunately nobody has responded yet from HP. And I agree with you that it's odd to see same behavior in all platforms.

I thank all of you guys for your time and help.

Regards,

Babak
Babak_4
Advisor

Re: sort command

I have to correct "myfile" to "mydata" which is John's file.


Babak

John Kittel
Trusted Contributor

Re: sort command

Wait! Oh no! Now my head is hurting. I'm sorry to have to say it, but I've changed my mind again. I think the example I showed really does prove it is a bug.

If " -k 1,2 -k 4 " is treated as 2 keys, where first key is concatenation of fields 1,2,3, and second key is field 4, then sort should put all records with field 2 = 1 ahead of all records where field 2 = 2. And it doesn't. To prove that, if my original data file, just remove all the field separators from between the first 3 fields, yielding a file with 2 fields. Then sort on the 2 fields. That should be equivalent to treating -k 1,3 as a single key made by concatenating the first 3 fields. But it doesn't give the same result. Compare these 2 sets of data and 2 sorts. The first is exactly what I put in my previous post, just for reference. See that in the second example below, I've just removed the field separators on the first 3 fields.

# cat mydata
1 2 1 1
1 2 2 2
1 2 1 2
1 2 2 1
1 1 2 2
1 1 2 1
1 1 3 1
1 1 3 2
1 2 3 2
1 2 3 1

# sort -n -k 1,3 -k 4 mydata
1 1 2 1
1 1 3 1
1 2 1 1
1 2 2 1
1 2 3 1
1 1 2 2
1 1 3 2
1 2 1 2
1 2 2 2
1 2 3 2

# cat mydata2
121 1
122 2
121 2
122 1
112 2
112 1
113 1
113 2
123 2
123 1

# sort -n -k 1 -k 2 mydata2
112 1
112 2
113 1
113 2
121 1
121 2
122 1
122 2
123 1
123 2
#