1845948 Members
2476 Online
110250 Solutions
New Discussion

numeric sort

 
SOLVED
Go to solution
Miguel Covas
Advisor

numeric sort

I can't understand the behavior of the -n option from sort command. Consider a file with the following lines:
aaaaa+0000001+
bbbbb+0000002+
ccccc+0000003+
ddddd+0000004+
eeeee-0000003-
fffff-0000002-
ggggg-0000001-
ggggg+0000001+
ggggg+0000001+
ggggg-0000004-

the command
sort -n -k 1.6,1.13
with the previous file produces:
ggggg-0000004-
eeeee-0000003-
fffff-0000002-
ggggg-0000001-
aaaaa+0000001+
bbbbb+0000002+
ccccc+0000003+
ddddd+0000004+
ggggg+0000001+
ggggg+0000001+

How come?
What about the two last lines of the output?
What I'm missing?

gnu sort has the same behavior, but it has a -g option which behaves properly. Is there anything odd with -n option?

(HP-UX noah B.11.00 U 9000/800 638329302 unlimited-user license)
13 REPLIES 13
James R. Ferguson
Acclaimed Contributor

Re: numeric sort

Hi Miguel:

I think you want:

# sort -n -k 1.6 -k 1.13 myfile

Regards!

...JRF...
harry d brown jr
Honored Contributor

Re: numeric sort

[root]pbctst: cat ttt | sort -n -k 1.6 -k 1.7,1.13
ggggg-0000004-
eeeee-0000003-
fffff-0000002-
ggggg-0000001-
aaaaa+0000001+
ggggg+0000001+
ggggg+0000001+
bbbbb+0000002+
ccccc+0000003+
ddddd+0000004+

Sorts on the sign first, then the number. The -n option is probably freaking out on the fact that "+" and "-" aren't numbers.

live free or die
harry
Live Free or Die
John Carr_2
Honored Contributor
Solution

Re: numeric sort

Hi

dependant on what you need to do with the results you could do something like this strip out the - and + then sort works fine.If you need the plus and minus in the output this is not a lot of ggod.

cat filename | tr "-" " " | tr "+" " " | sort -n +1

John.
John Carr_2
Honored Contributor

Re: numeric sort

Hi

this will do it

cat filename | tr "-" " - " | tr "+" " + " | sort -n +1 | tr " - " "-" | tr " + " "+"

long winded but works.

John.
Miguel Covas
Advisor

Re: numeric sort

after some further manual reading, plus some clues from previous replies I understand that sort -n does not consider a plus sign as the starting of a number (As, by the way, one can deduce from the manual). Only a minus or a blank can optionally start a number string sequence.
What drives to confusion is the fact that the manual states that +000.. will be treated as -000.. which implies the possible use of plus as starting sign.

The proper solution seems to me replacing + with blank or zero.

Thank you anywa
James R. Ferguson
Acclaimed Contributor

Re: numeric sort

Hi Miguel:

Ah, thank you. In that case (using your input data):

# sort -k 1.6r,1.6r -k 1.7n,1.13n -k 1.1,1.5 /tmp/sort

...would yield:

ggggg-0000001-
fffff-0000002-
eeeee-0000003-
ggggg-0000004-
aaaaa+0000001+
ggggg+0000001+
ggggg+0000001+
bbbbb+0000002+
ccccc+0000003+
ddddd+0000004+

BTW, I see that you joined the Forum today. Welcome!!!

Regards!

...JRF...
James R. Ferguson
Acclaimed Contributor

Re: numeric sort

Hi (again) Miguel:

Oh, brain death. I keep looking at the sign on the right (after) the number *not* before, so disregard the last offering.

The problem is nasty because the "-" sign appears higher (\055) in the collating sequence than the "+" sign {\053). Worse yet, is that we want a descending numeric sort for negative numbers and an ascending one for positive ones.

Regards!

...JRF...
Miguel Covas
Advisor

Re: numeric sort

Hi James,

yes I will get that result, which is not what is intended.
In numeric sorting negative number sort "descending" according to absolute value.
Positive numbers sort "ascending" according to absolute value. Managing absolute values as string is OK as long as you don't have decimal points.

Mixing positive and negative values always needs numeric sorting with no tricks.
John Carr_2
Honored Contributor

Re: numeric sort

Hi Miquel

are the numbers posative and negative depending on plus or minus sign im sure we can still manipulate the file to give the right sequence ?

John.
Miguel Covas
Advisor

Re: numeric sort

The main problem I'm addressing is that most of these files come from environments where fixed length records are quite trendy. To adapt those files to unix-sort standards I need to substitute blank for plus sign at a given column, not every plus sign.

This will do it while we look for a more elegant solution based on REs:

awk '{
if (substr($0,6,1) == "+")
print substr($0,1,5) " " substr($0,7) ;
else
print $0 ;
}' | sort -k1.6n,1.13
John Carr_2
Honored Contributor

Re: numeric sort

Hi

ok here we go again but you have to loose the plus signs and the trailing minus.

cat filename | tr "+" " " | sed -e 's/-/ -/g' | awk '{ print $1 " " $2 }' | sort +n +1


cheeers

John.
John Carr_2
Honored Contributor

Re: numeric sort

Hi Miquel

I didn't see your last response before I submitted my last thoughts.

good luck
John.
Miguel Covas
Advisor

Re: numeric sort

Good try, but what about having more plus signs on different columns (For instance: 4.5e+08),

and your awk will behave erratically if there are more blanks in the record.

Please, don't take my example as the real files. It was intended just to example.

Take
US3456357+0003.89-456A234
DE3562990-00233.3+4555666
DE34+++78+003.302N7888888
this col.^ to ^

First three plus signs of last record should not be lost
neither the last in the second record.

BTW what about a nice ERE to substitute ONLY the sixth char.