Operating System - HP-UX
1823967 Members
4282 Online
109667 Solutions
New Discussion юеВ

sort on HPUX and LINUX differs

 
Bernd Rieke
Occasional Advisor

sort on HPUX and LINUX differs

Hi,

I have the following file and want to sort him by the first 4 bytes numerically:

$cat file
200abc
1004711

HPUX 10.20: sort -k1.1n,1.4 file

200abc
1004711

LINUX: sort -k1.1n,1.4 file

1004711
200abc

Which machine gives wrong result?

Greetings

Bernd Rieke
19 REPLIES 19
CHRIS_ANORUO
Honored Contributor

Re: sort on HPUX and LINUX differs

LINUX machine gave the wrong sort order, since you are sorting using the 1st field as the sort key from the file.
The -k option is intended to replace the obsolete [+pos1 [+pos2]] notation, using field_start and field_end respectively.
When We Seek To Discover The Best In Others, We Somehow Bring Out The Best In Ourselves.
Bernd Rieke
Occasional Advisor

Re: sort on HPUX and LINUX differs

Stop, there is an error in displaying
the contents of file! Each line
starts with a blank. This was
lost when I copied with cut and paste.
So the contents of the file is

200abcd
1004711
^
+--- here is the blank

Sorry
Bernd Rieke
Occasional Advisor

Re: sort on HPUX and LINUX differs

No the blank was lost because all
lines are shiftet to the left in this
forum. For test I'll try a little
program:

if(i == 1) {
x=9
}
Bernd Rieke
Occasional Advisor

Re: sort on HPUX and LINUX differs

Yes, looks strange the program. Nothing
is indented. HP you can't do it. Please display anything as it is typed!
Alan Riggs
Honored Contributor

Re: sort on HPUX and LINUX differs

To amplify slightly on Chris answer:

The difference in sort result you see is the result of the shorter length of the " 200a..." line. HPs sort command truncates the "a..." as a non-numeric (when you have used the -n flag). LINUX is failing to do this. You can see the difference quite clearly with something like:
# sort file
1000
2000
300a
4000
5000

# sort -n file
300a
1000
2000
4000
5000
Bernd Rieke
Occasional Advisor

Re: sort on HPUX and LINUX differs

But why is sort looking at the "a"?
I restrict the sort key to column 1 to 4
(remember the leading blank on each line)
And within the bytes 1 to 4 there is no
"a". Only the numbers 100 and 200....
Paul Hite
Trusted Contributor

Re: sort on HPUX and LINUX differs

I keyed in your original example of a two line file and allowed for the leading space. With both 10.20 and 11.00 I cannot reproduce the erroneous output that you show. Instead I get same output you show from Linux.

Make sure you are using /usr/bin/sort. Also make sure there are no hidden characters in your input file. When I do "what /usr/bin/sort" on 10.20, I get:
$Revision: 78.5

Do you have a different version?
Alan Riggs
Honored Contributor

Re: sort on HPUX and LINUX differs

You specified the -n flag. A numeric sort be design strips off leading blanks. Paul, I can only assume that you ran your test without specifying a numeric sort.
Paul Hite
Trusted Contributor

Re: sort on HPUX and LINUX differs

Using a capital B to represent blanks, my input file is:
B200abc
B1004711
When I run the command: sort -k1.1n,1.4 file
I get the following output:
B1004711
B200abc

Again those B's are really blanks, I am compensating for the software on this site.

The output I am getting is exactly what I would expect and I get the same output on 10.20 and 11.00. Alan, what result do get? What output would you expect?
Bernd Rieke
Occasional Advisor

Re: sort on HPUX and LINUX differs

Thanks for all of you who replied.
Because some of you stated that sort
on their HP 10.20 gives other results
then the one I get I compared different versions of sort introduced with the
patches. And I found the answer:
the wrong result was introduced with
patch PHCO_16303. So everybody who is
patching behind this patchlevel may
get wrong results on numeric sorts or
at least a different one then before
this patchlevel.

HP what are you saying????

(Remeber that B stands for a Blank)

bernd/107$ what /tmp/sort
/tmp/sort:
$Revision: 78.5.1.5 $
PATCH_10_20: sort.o hpux_rel.o 97/12/10
PATCH-10.20:PHCO_13399,10.30:PHCO_13400,11.00:PHCO_13401 libc.a_ID@@/main/r10dav/libc_dav/libc_dav_cpe/7
/ux/core/libs/libc/archive_pa1/libc.a_ID
Dec 2 1997 11:22:33
bernd/107$ /tmp/sort -k1.1n,1.4 file
B1004771
B200a <<<<<< it's ok


bernd/107$ what /var/tmp/sort
/var/tmp/sort:
$Revision: 78.5.1.8 $
PATCH_10_20: sort.o hpux_rel.o 99/02/26
PATCH-PHCO_16303 for 10.20; for 10.30, 11.x compatibility libc.a_ID@@/main/r10dav/libc_dav/libc_dav_cpe/8
/ux/core/libs/libc/archive_pa1/libc.a_ID
Sep 11 1998 16:54:45
bernd/107$ /var/tmp/sort -k1.1n,1.4 file
B200a
B1004771 <<<< it's wrong


bernd/107$ what /usr/bin/sort
/usr/bin/sort:
$Revision: 78.5.1.11 $
PATCH_10_20: sort.o hpux_rel.o 99/08/30
PATCH-PHCO_18644 for 10.20; for 10.30, 11.x compatibility libc.a_ID@@/main/r10dav/libc_dav/libc_dav_cpe/9
/ux/core/libs/libc/archive_pa1/libc.a_ID
Jul 8 1999 15:44:31
bernd/107$ /usr/bin/sort -k1.1n,1.4 file
B200a
B1004771 <<<<< it's wrong, too
bernd/107$
Alan Riggs
Honored Contributor

Re: sort on HPUX and LINUX differs

Paul, I get the result I posted earlier. I viewed this as a result of HP-UX truncating non-numeric information from the field for a numeric sort. With that understanding, it is the expected result. If you assume that HP-UX will attempt to interpret "a" as a decimal number, then I would expect your result. Personally, I would prefer bad input be thrown out, but the moral here seems to be "always test with boundary conditions". Good advice for any command/data set.
Paul Hite
Trusted Contributor

Re: sort on HPUX and LINUX differs

Alan the "a" should be ignored because it is not part of the sort key. One key is " 100" and the second key is " 200". In both cases we have a 4 character sort key. The other characters should be ignored because they are just data that is going along for the ride. If there was -b arg in effect, then I could see the "a" becoming relevant.
Alan Riggs
Honored Contributor

Re: sort on HPUX and LINUX differs

Paul -- except that a numeric sort quite explicitely is designed to ignore leading blanks. So the two keys, if we throw out the "a", are 200 and 1000.
Bernd Rieke
Occasional Advisor

Re: sort on HPUX and LINUX differs

But I'm giving a clear order to sort(1):
Extract the bytes 1 to 4 (included) as
the sortkey. Any other byte of the line
is in no way of interest building the
sortkey! So the sort key is B100 and B200,
nothing else! Tell me any reason why sort
should look at other bytes, in this case
byte 5 of the line. After having extracted
the sortkey the discussion about 'kill'
leading blank or not is not important
because B100 is the same as 100B, seen
numerically (remember the B stands for
a blank in the file).

And another question: we are using HPUX
till 15 years and sort was working like
it does before patch PHCO_16303 all the
time (and like LINUX and SOLARIS are).
Why can HP change the behavior of sort
without any announcement? Sort is an
important program and many many outputs
rely on it!
Alan Riggs
Honored Contributor

Re: sort on HPUX and LINUX differs

From man sort:

The -n option implies the -b option (see below).
.
.
.
-b Ignore leading blanks when determining the starting and ending positions of a restricted sort key. If the -b option is specified before the first -k option (+pos1 argument), it is applied to all -k options

We can argue about whether this is how it SHOULD be, but it clearly indicates that you have specified your keys to be 1000 and 200a.
Paul Hite
Trusted Contributor

Re: sort on HPUX and LINUX differs

Sorry, Alan, you're right. I had missed the "-n implies -b" on HP's man page.

Bernd, the phrase Alan mentions does appear on HP's manpage although it is absent on the sort manpage of Solaris. I don't have access to a Linux, but I'll bet that the phrase is absent there as well.

So after the patch to HP-UX, all unix versions are operating according to their manpages. And this is just another command that's a little different on HP-UX verses other versions of unix.

Again, I'm sorry to have contributed to the confusion, and thanks to Alan for clearing up the confusion.
Bernd Rieke
Occasional Advisor

Re: sort on HPUX and LINUX differs

Paul, Alan thanks for the discussion up to
this point. We are ending up with the result
that sort satisfies the manpage. But in real
life programs should satisfy our needs. This
then ends in a manpage which discribes the
behavior of the program and not the other way
round.

I asked around but nobody could tell me an
example where the '-n implies -b option'
makes sense. Can one of you? Normaly the -k
option is used to sort files without delimiters between the sortkeys. How is it
possible to sort such a file containing
numeric columns? For example (I had it yesterday in a similar way) sort the following file in the order 3. column, then
1. and the 2. (3. ends on byte 16):

BB10bernd 12334abc3242235
B100alan 123523123123KLLL
BBB1paul 2234$1433333

sort -k1.10n,1.16 -k1.1n,1.4 -k1.5n,1.9 doesn't work. No way to do it with the HPUX sort!!!

Another fact as we are speaking about formal
things: the webpage www.unixsolutions.hp.com/products/hpux/hpux11_futures.html says that HPUX complies with the Open Group "Single UNIX Specification" (SUS I will use for short).

I looked at the manpage for sort within the
"SUS". In contrary to the HP manpage the
'-n implies -b' is missing! The HP manpage
doesn't comply with the standard and therefore sort isn't it, too. And thats the mess. Don't mix up options. If somebody
wants -b then he should specify -b

Thanks again to you.
Bernd
Alan Riggs
Honored Contributor

Re: sort on HPUX and LINUX differs

I do not work for HP, so I will not attempt to explain WHY the sort command uses that particular logic. If looking for examples in which it makes sense, I naturally think or right justified solumnar output, the kind you would find in accounts ledgers or the in teh output stram of many Unix commands.

In principal, however, I agree with you on this point: there should always be a means for the user to override default options. I do not mind "-n implies -b". But I do feel there should be a means of overiding this default. To steal a signof I ran across the other day:

0 1 -- my two bits
Paul Hite
Trusted Contributor

Re: sort on HPUX and LINUX differs

Personally, I feel that the "-n implies -b" was a very poor decision. And an option to undo this default would be better than nothing, but still is not a good idea. I would rather see HP conform to the standards. Ideally, it should be possible to write scripts that work on any version of unix.

But things are the way they are. HP always seems to march to its own beat. This isn't the first time I have been burned HP's strange twist to a standard unix command. Nor will it be the last.

The moral is to read HP's man pages carefully no matter how familiar you are with the command on other versions of unix.