Operating System - Linux
1827721 Members
2757 Online
109968 Solutions
New Discussion

Re: Sorting a list by a field

 
SOLVED
Go to solution
dictum9
Super Advisor

Sorting a list by a field


Greetings,

I have a list with one of the field in the mm/dd/yy format and I would like to sort by that field, chronologically. I've read the man page on sort but cannot quite get it to sort correctly. Advice?
31 REPLIES 31
Pete Randall
Outstanding Contributor
Solution

Re: Sorting a list by a field

Did you try "sort -n"?


Pete

Pete
James R. Ferguson
Acclaimed Contributor

Re: Sorting a list by a field

Hi:

OK, given this file:

03/03/2006
01/03/2007
11/11/2006
10/10/2005
04/02/2006
02/04/2006

# sort -kn1.7,7.10 -kn1.1,1.2 -kn1.3,1.4 /tmp/0214b

10/10/2005
02/04/2006
03/03/2006
04/02/2006
11/11/2006
01/03/2007

Regards!

...JRF...
dictum9
Super Advisor

Re: Sorting a list by a field


Thanks.

Actually, in my case, the date is the 8th field on the line, and sometimes it's in the 80th column and sometimes not. So it's not always in the same column.
How do I account for this inconsistency? Can I use awk?

James R. Ferguson
Acclaimed Contributor

Re: Sorting a list by a field

Hi (again):

> Actually, in my case, the date is the 8th field on the line, and sometimes it's in the 80th column and sometimes not. So it's not always in the same column.
How do I account for this inconsistency? Can I use awk?

This doesn't matter. The 'sort' counts its keys from one. Hence '-k1' means the first field and '-k8' would be the eighth. Fields are delimited by whatever you specify in '-t'. Hence '-t' without further qualification means a blank as a field delimiter regarless of what "column" offset.

Have another look at the manpages for 'sort' using the solution I first posted.

Regards!

...JRF...
Bill Hassell
Honored Contributor

Re: Sorting a list by a field

The definition of a field is that it is an unambiguous element so the concept of columns is unimportant. As long as the definition of each field is unique (ie, each field is separated by a : or , or / etc) then parsing the elements is trivial with many commands like sort, awk, cut, etc. To see this at work, run the cut command to find the user and the user's HOME directory (fields 1 and 6):

cut -f 1,6 -d: /etc/passwd

-d is the field delimiter. For awk, you specify the delimiter with -F and for sort the field delimiter is set with -t. awk is particularly useful as it predefines variables like NF (number of fields) so regardless of spacing, the last field can be extracted with $NF as in:

awk -F: '{print $1,$NF}' /etc/passwd

which prints the username and the shell.

Now your list may have fields defined with one delimiter such as a comma and within the field, data such as mm/dd/yy that uses / as a separator. That will be a problem because sort cannot understand nested fields. If you can show an example line, perhaps wew can suggest a way to sort it.


Bill Hassell, sysadmin
Ralph Grothe
Honored Contributor

Re: Sorting a list by a field

Just out of curiosity,
because I am more accustomed to Perl's sort and seldom use Unix sort.
Isn't there a typo in James' solution in the end field specifier of the 1st sort criteria,
and shouldn't it rather be something like:

$ sort -n -k 1.7,1.10 -k 1.1,1.2 -k 1.3,1.4 < /file/to/sort.ascii

I'm not asking to appear a smart-ass
but rather to understand Unix sort better.
Madness, thy name is system administration
Dennis Handly
Acclaimed Contributor

Re: Sorting a list by a field

>Ralph: Isn't there a typo in James' solution

It looks like it. Though I don't think you can have a space after -k and you don't need "<" for file input.
Peter Godron
Honored Contributor

Re: Sorting a list by a field

Hi,
as it is field 8 and the format is DD/MM/YY should it not be:
sort -n -k 8.7,8.9 -k 8.1,8.2 -k 8.4,8.5
Peter Godron
Honored Contributor

Re: Sorting a list by a field

Hi, [ CORRECTION to stated date format ]
as it is field 8 and the format is MM/DD/YY should it not be:
sort -n -k 8.7,8.9 -k 8.1,8.2 -k 8.4,8.5
Ralph Grothe
Honored Contributor

Re: Sorting a list by a field

Hi Dennis,

thanks for confirmation.
I was really puzzled what that 7 meant,
till I realized that in Anglo-American writing of Arabic numbers the 7, unlike in German, doesn't get the short horizontally crossing dash (which I find better disambiguates it from the 1; ok we supply the 1 with a short ascending noselike slash which could lead to misreading it as an Anglo-American 7, that's why we prefer the dash ;-)
So it seemed obvious that it was a typo.

I acknowledge that probably each Unix derivative has its own sort syntax (another pro for Perl).
But according to the synopsis of HP-UX's sort manpage there may be a space between -k and field specifier.


SYNOPSIS
sort [-m] [-o output] [-bdfinruM] [-t char] [-k keydef] [-y [kmem]] [-z
recsz] [-T dir] [file ...]

sort [-c] [-AbdfinruM] [-t char] [-k keydef] [-y [kmem]] [-z recsz] [-T
dir] [file ...]


Ok, the redirection of stdin is redundant
but seems to work.
Madness, thy name is system administration
Peter Nikitka
Honored Contributor

Re: Sorting a list by a field

Hi,

what 'etc' is telling us about 'not the same column' may be the fact, that there may be 'empty' columns, which count for 'sort' by default. Such empty fields sort before any filled fields, of course.
You can skip empty fields by using the option '-b'. In the example below there are several :
sort -k2n,3n -k 3
yy 1 bb
aa 1 ba
will output
yy 1 bb
aa 1 ba

but
sort -b -k2n,3n -k 3
yy 1 bb
aa 1 ba
will lead to
aa 1 ba
yy 1 bb

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
James R. Ferguson
Acclaimed Contributor

Re: Sorting a list by a field

Hi Folks:

I think my original post is indeed correct. You can test with additional data as you please.

I chuckled over how we all stumble reading dates of the form mm/dd [ or shouldn't that be dd/mm ? :-) ].
dictum9
Super Advisor

Re: Sorting a list by a field


I figured out what the problem is. The date field is not in a consistent position in the file, sometimes it's field 8 and sometimes 9.

That is,

xxx xxx xxx xxx xxx xxx xxxx xxxx 02/11/07
xxx xxx xxx xxx xxx xxx xxxx 02/02/07

So they don't sort right.
The only thing that I do know is that it's always the last field. I cannot think how to make awk or sort work on this.
OldSchool
Honored Contributor

Re: Sorting a list by a field

is the date always the last field? If so, in awk you can print $NF.

That being the case, you can examine the "value" of nf and create a tmp file that either has dummy pad field added so the location is consistent, or prepend the date to the front of the line, sort it, then strip the prepended date, leaving you with the file in sort order in it's original format.
dictum9
Super Advisor

Re: Sorting a list by a field

Yes, date is always the last field. Are you saying I have to rewrite the file, sort it and then put it back? It has to be the last field and I need it as an input file for something else.
James R. Ferguson
Acclaimed Contributor

Re: Sorting a list by a field

Hi:

> I figured out what the problem is. The date field is not in a consistent position in the file, sometimes it's field 8 and sometimes 9.

In that case, you need to leverage something other than the standard 'sort'.

As Ralph noted, Perl could be used.

Another solution is to filter your file and *pad* any records of less than nine fields with a bogus field-8. Then use the standard 'sort' as I suggested.

A very crude 'awk' filter like this might work:

awk '{if (NF <9) {print $1,$2,3,$4,$5,$6,$7,"XXX",$NF} else {print}} file | sort ...

Regards!

...JRF...
OldSchool
Honored Contributor

Re: Sorting a list by a field

etc,

Yes,

a) rewrite to temp w/ the $NF prepended, followed by ALL of the data fields.

b) sort the resulting temp file

c) strip off the first field (added date), resulting in the file having its original form, but sorted.

d)move this temp to the original location.

this allows you to sort the file and still have the fields in the original layout for any subsequent processing

OldSchool
Honored Contributor

Re: Sorting a list by a field

something like:

if ($NF = 8)
print $NF, $1,$2,$3,$4,$5,$6,$7,$8
else
print $NF, $1,$2,$3,$4,$5,$6,$7,$8,$9


that's the theory, syntax maybe off however
Al Langen_1
Advisor

Re: Sorting a list by a field

Sounds like a job for AWK, a super utility!

awk '{printf("%s|%s\n", $NF, $0)}' yourfile | sort -n | awk 'BEGIN{ FS="|"} {print $2}' > newfile

Alfred
Dennis Handly
Acclaimed Contributor

Re: Sorting a list by a field

>James: I think my original post is indeed correct. You can test with additional data as you please.

There is a difference between works and correct.

You say the field ends in field 7, position 10. But there is only one field. There are no other fields that could be added that would cause the sort to mysteriously fail, so it works. (Unless you remove the "n".)

I also don't see that "n" is allowed before the position, it should be at the end:
-k1.7n,1.10n

James R. Ferguson
Acclaimed Contributor

Re: Sorting a list by a field

Hi Dennis:

Your points are well-taken and I wasn't trying to be glib.

> You say the field ends in field 7, position 10. But there is only one field. There is no other fields that could be added that would cause the sort to mysteriously fail, so it works.

I agree. If you had data like this (file):

03/03/2006,1
01/03/2007,2
11/11/2006,3
10/10/2005,4
04/02/2006,5
02/04/2006,6
4/ 4/2004,7
5/5/2005 ,8
05/06/2005,9

Hence, with the default delimiter in force (a blank) I want to limit the key as I originally wrote:

# sort -kn1.7,1.10 -kn1.1,1.2 -kn1.3,1.4 file

(not)

# sort -kn1.7 -kn1.1,1.2 -kn1.3,1.4 file

> ...also [I] don't see that "n" is allowed before the position, it should be at the end:
-k1.7n,1.10n

Yes, I guess that does seem more reasonable. The 'sort' manpages note:

/*begin_quote*/

The arguments field_start and field_end each have the form m.n which are optionally followed by one or more of the type options b, d, f, i, n, r, or M. These modifiers have the functionality for this key only, that their command-line counterparts have for the entire record.

/*end_quote*/

So, are we closer in agreement?

Regards!

...JRF...

Dennis Handly
Acclaimed Contributor

Re: Sorting a list by a field

>James: I want to limit the key as I originally wrote:
# sort -kn1.7,1.10 -kn1.1,1.2 -kn1.3,1.4 file

Both Ralph and I agree with that. Except you originally wrote:
# sort -kn1.7,7.10 ...

With 7.10 instead of 1.10.
James R. Ferguson
Acclaimed Contributor

Re: Sorting a list by a field

Hi Dennis & Ralph:

Holy smoke...I had to look at my first post *twice* to see my error. You and Ralph have sharp eyes!

*Now* I see what you did. Spock's mind-meld would be easier :-))

My thanks, and my apologies to both you & Ralph!

Regards!

...JRF...
Arturo Galbiati
Esteemed Contributor

Re: Sorting a list by a field

Hi,
to have a list of dates (last field of the record) sorted you can use:

awk '{print $NF}' file |sort -t "/" -kn3 -kn1 -kn2

HTH,
Art