cancel
Showing results for 
Search instead for 
Did you mean: 

Sort problem

SOLVED
Go to solution
maberg
Occasional Advisor

Sort problem

I have a fixed file length, with embedded spaces.
I am trying to sort on the employee number in column 10-19, but I get undesirable results.

File is (file.txt):
1998945 Warble
8 5032 Rebholz
1124688 Barrera
2327703 Rebholz

Result is:
1998945 Warble
8 5032 Rebholz
1124688 Barrera
2327703 Rebholz

Command script is:

export LC_COLLATE=en_US.ISO8859-1
export LC_CTYPE=en_US.ISO8859-1
export LC_MESSAGES=en_US.ISO8859-1
export LC_MONETARY=en_US.ISO8859-1
export LC_NUMERIC=en_US.ISO8859-1
export LC_TIME=en_US.ISO8859-1

sort -t# -k1.10n,1.19n file.txt -o test.sorted

25 REPLIES
James R. Ferguson
Acclaimed Contributor

Re: Sort problem

Hi:

Posting a textual attachment of the input file would be more helpful.

Regards!

...JRF...
maberg
Occasional Advisor

Re: Sort problem

I have uploaded the input and sorted files.
James R. Ferguson
Acclaimed Contributor
Solution

Re: Sort problem

Hi:

You have unequal numbers of fields as shown with the record with an "8" as the first field.

You could do:

# awk '{if (NF>2) {print $2,$0} else {print $1,$0}}' file|sort -kn1,1|cut -d" " -f2-

...which produces:

8 5032 Rebholz
1124688 Barrera
1998945 Warble
2327703 Rebholz

Regards!

...JRF...
maberg
Occasional Advisor

Re: Sort problem

Thanks for the input, but I am trying to use the sort utility only. I need to apply this to a larger file with the same key information. I used this smaller file for simplicity. This smaller file had the same issues as the larger file.
V. Nyga
Honored Contributor

Re: Sort problem

Hi,

maybe you made it more complicate then simply?
Where does the '8' comes from?
Is it the last number of the previous column?
Can you give a better sample?

Volkmar
*** Say 'Thanks' with Kudos ***
Mel Burslan
Honored Contributor

Re: Sort problem

I don't believe you can sort the file sample you gave in the way you expect it to be sorted using just sort command. Sort command does depend on a field separator, which is a space or a tab character unless otherwise specified.

In your example, you are expecting sort to recognize a certain range of characters (say 10-to-19 for the employee number) as a field. It is not going to happen as far as my understanding of sort man pages go.

You will need to use some interim parsing of the input file, otherwise all the spaces trailing any line, in the first field is going to be interpreted as a field separator and your first key field for sort is going to be the employee number, whereas when the line starts with that digit "8" your employee number will be key field 2. So, as JRF said, you have non-uniform fields for sort to interpret. And to be perfectly honest, JRF's solution, using awk in conjunction with sort is the most elegant solution you can come up with in your case. If the pipes in that command chain is not working due to the large input file size, the only thing you can do is to parse that command into separate commands and create interim files as you go. That should give you some flexibility.
________________________________
UNIX because I majored in cryptology...
maberg
Occasional Advisor

Re: Sort problem

The "8" is another data component in column 1. It is just a data item. Other data characters can be accounted for in column 1 - 9, but not always.
Mel Burslan
Honored Contributor

Re: Sort problem

In order to understand what I mean by the non-uniform fields, do this simple exercise:

$ cat myinputfile | while read line ; do
f1=`echo $line |cut -c 1-9`
f2=`echo $line |cut -c 10-19`
f3=`echo $line |cut -c 20-`
echo $f1","$f2","$f3
done

and see where the character counted fileds are being cut-off.

Hence the need for some external utility like awk to force them to unformity
________________________________
UNIX because I majored in cryptology...
Tingli
Esteemed Contributor

Re: Sort problem

You can try:

sort -t# -k1.10nb,1.19nbb file.txt -o test.sorted
Tingli
Esteemed Contributor

Re: Sort problem

Or:
sort -t'#' -k1.10n,1.19n file.txt -o test.sorted
maberg
Occasional Advisor

Re: Sort problem

I did try "sort -t# -k1.10nb,1.19nbb" and this was the result.

I am not trying to hide anything, but the total file length of the actual file is 1250 bytes in total length. It would be too large to effectively work with here since the small file still behaves the same way.

Result:
1124688 Barrera
1998945 Warble
2327703 Rebholz
8 5032 Rebholz
maberg
Occasional Advisor

Re: Sort problem

The sort utility is supposed to be able to sort fixed file formats. This works fine on a Solaris server. We are migrating away from the Solaris platform.

I tried "sort -t'#' -k1.10n,1.19n" and the result was:

Usage: sort [-AbcdfiMmnru] [-T Directory] [-tCharacter] [-y kilobytes] [-o File]
[-k Keydefinition].. [[+Position1][-Position2]].. [-z recsz] [File]..
1124688 Barrera
1998945 Warble
2327703 Rebholz
8 5032 Rebholz

James R. Ferguson
Acclaimed Contributor

Re: Sort problem

Hi (again):

> The "8" is another data component in column 1. It is just a data item. Other data characters can be accounted for in column 1 - 9, but not always.

Exactly. It (or any group of characters surrounded by whitespace, as the default field delimiter) constitutes a "field" in your file. This influences what 'sort()' sees as the n-th field you use as a sort key.

Re-read Mel's explanation of my original post and his second offering to you, too.

> Thanks for the input, but I am trying to use the sort utility only.

You _are_ the 'sort' to solve your problem. You simply need to invariantly define the _field_ you want to sort on. All I did was to manufacture a temporary sort-key; perform the requisite sort; and snip the manufacture key from the output.

Regards!

...JRF...
maberg
Occasional Advisor

Re: Sort problem

I appreciate everyone's input. It is very helpful to get ALL ideas. I am trying them all and I am trying to give you as much information as I know of about the file details.

I really appreciate the efforts here as I am at my wits end over this....
maberg
Occasional Advisor

Re: Sort problem

I am digesting Mel and James information and trying to adapt it. My understanding was the the -t option would cause sort to use the whole record as a fixed field when the -t specified a character not contained within the input record. The current job has been this way for a long time. I will try your suggestions again and do some tweaking.

Mel's output returned:
1998945 W,arble,
8 5032 Re,bholz,
1124688 B,arrera,
2327703 R,ebholz,
Tingli
Esteemed Contributor

Re: Sort problem

Give this a shot:

sort -t'#' -k1.10nb,1.19nb
James R. Ferguson
Acclaimed Contributor

Re: Sort problem

Hi (again):

> My understanding was the the -t option would cause sort to use the whole record as a fixed field when the -t specified a character not contained within the input record.

OK, I just figured out that that was why you kept specifying '-t#'. I guess I'm a bit dense today.

This doesn't work on HP-UX as logical as it would seem. It DOES WORK on AIX (and obviously Solaris) underscoring again that various UNIX dialects differ in the edge cases.

Regards!

...JRF...
OldSchool
Honored Contributor

Re: Sort problem

the man page for sort on HPUX lists the -t option.

you might try -t"#" or see if "stty -a" is using # for something...

Dennis Handly
Acclaimed Contributor

Re: Sort problem

>sort -t# -k1.10n,1.19n file.txt -o test.sorted

This looks like the right command except that -o. You need to use ">" or move it sooner:
sort -t# -k1.10n,1.19n file.txt > test.sorted

>I tried "sort -t'#' -k1.10n,1.19n" and the result was: Usage: ...

Do you still get that "Usage:" message?
maberg
Occasional Advisor

Re: Sort problem

Maybe everyone is right, that we have been misusing this utility. Is there another way to sort a file using multi-column fields withing a record for sorting??
Dennis Handly
Acclaimed Contributor

Re: Sort problem

>Is there another way to sort a file using multi-column fields withing a record for sorting?

It's called COBOL. :-)
Except the numbers must be right justified in their fields. Of course the input procedure could fix them.
OldSchool
Honored Contributor

Re: Sort problem

Dennis' suggestion of:

sort -t# -k1.10n,1.19n file.text > test.sorted

should work, while:

sort -t# -k1.10n,1.19n -o test.sorted file.txt

*might* work...

what happens when you run either of those exactly as shown?
maberg
Occasional Advisor

Re: Sort problem

Result for ( sort -t# -k1.10n,1.19n ground.txt2 > test.sorted ):

1124688 Barrera
1998945 Warble
2327703 Rebholz
8 5032 Rebholz


Result for ( sort -t# -k1.10n,1.19n -o test.sorted ground.txt2 ):

1124688 Barrera
1998945 Warble
2327703 Rebholz
8 5032 Rebholz
James R. Ferguson
Acclaimed Contributor

Re: Sort problem

Hi (again):

> Dennis: It's called COBOL. :-)
Except the numbers must be right justified in their fields. Of course the input procedure could fix them.

And its what COBOL would call an input procedure that I used when I manufactured the temporary fixed-position sort key. The output procedure stripped it.

The use of '-o output' file doesn't matter in HP-UX. In AIX, this simple form WORKS (even without the '-t#'):

# sort -k1.10n,1.19n file

Regards!

...JRF...