1752359 Members
6531 Online
108787 Solutions
New Discussion юеВ

Re: Sort problem

 
SOLVED
Go to solution
maberg
Occasional Advisor

Sort problem

I have a fixed file length, with embedded spaces.
I am trying to sort on the employee number in column 10-19, but I get undesirable results.

File is (file.txt):
1998945 Warble
8 5032 Rebholz
1124688 Barrera
2327703 Rebholz

Result is:
1998945 Warble
8 5032 Rebholz
1124688 Barrera
2327703 Rebholz

Command script is:

export LC_COLLATE=en_US.ISO8859-1
export LC_CTYPE=en_US.ISO8859-1
export LC_MESSAGES=en_US.ISO8859-1
export LC_MONETARY=en_US.ISO8859-1
export LC_NUMERIC=en_US.ISO8859-1
export LC_TIME=en_US.ISO8859-1

sort -t# -k1.10n,1.19n file.txt -o test.sorted

25 REPLIES 25
James R. Ferguson
Acclaimed Contributor

Re: Sort problem

Hi:

Posting a textual attachment of the input file would be more helpful.

Regards!

...JRF...
maberg
Occasional Advisor

Re: Sort problem

I have uploaded the input and sorted files.
James R. Ferguson
Acclaimed Contributor
Solution

Re: Sort problem

Hi:

You have unequal numbers of fields as shown with the record with an "8" as the first field.

You could do:

# awk '{if (NF>2) {print $2,$0} else {print $1,$0}}' file|sort -kn1,1|cut -d" " -f2-

...which produces:

8 5032 Rebholz
1124688 Barrera
1998945 Warble
2327703 Rebholz

Regards!

...JRF...
maberg
Occasional Advisor

Re: Sort problem

Thanks for the input, but I am trying to use the sort utility only. I need to apply this to a larger file with the same key information. I used this smaller file for simplicity. This smaller file had the same issues as the larger file.
V. Nyga
Honored Contributor

Re: Sort problem

Hi,

maybe you made it more complicate then simply?
Where does the '8' comes from?
Is it the last number of the previous column?
Can you give a better sample?

Volkmar
*** Say 'Thanks' with Kudos ***
Mel Burslan
Honored Contributor

Re: Sort problem

I don't believe you can sort the file sample you gave in the way you expect it to be sorted using just sort command. Sort command does depend on a field separator, which is a space or a tab character unless otherwise specified.

In your example, you are expecting sort to recognize a certain range of characters (say 10-to-19 for the employee number) as a field. It is not going to happen as far as my understanding of sort man pages go.

You will need to use some interim parsing of the input file, otherwise all the spaces trailing any line, in the first field is going to be interpreted as a field separator and your first key field for sort is going to be the employee number, whereas when the line starts with that digit "8" your employee number will be key field 2. So, as JRF said, you have non-uniform fields for sort to interpret. And to be perfectly honest, JRF's solution, using awk in conjunction with sort is the most elegant solution you can come up with in your case. If the pipes in that command chain is not working due to the large input file size, the only thing you can do is to parse that command into separate commands and create interim files as you go. That should give you some flexibility.
________________________________
UNIX because I majored in cryptology...
maberg
Occasional Advisor

Re: Sort problem

The "8" is another data component in column 1. It is just a data item. Other data characters can be accounted for in column 1 - 9, but not always.
Mel Burslan
Honored Contributor

Re: Sort problem

In order to understand what I mean by the non-uniform fields, do this simple exercise:

$ cat myinputfile | while read line ; do
f1=`echo $line |cut -c 1-9`
f2=`echo $line |cut -c 10-19`
f3=`echo $line |cut -c 20-`
echo $f1","$f2","$f3
done

and see where the character counted fileds are being cut-off.

Hence the need for some external utility like awk to force them to unformity
________________________________
UNIX because I majored in cryptology...
Tingli
Esteemed Contributor

Re: Sort problem

You can try:

sort -t# -k1.10nb,1.19nbb file.txt -o test.sorted