- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Sort question.
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2002 01:08 PM
09-25-2002 01:08 PM
I have a web log file with format of:
66.196.65.25 - - [17/Sep/2002:09:00:57 -0500] "GET /robots.txt
I need to sort by date. So sort by month (Sep), day (17), time (09:00:57).
What's the easiest way to do so?
TIA,
Sean
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2002 01:25 PM
09-25-2002 01:25 PM
Re: Sort question.
I'd use something like this:
# sort -t/ -k3 -kM2 -k1 filein
This uses the "/" as the field delimiter. The '-M' signifies the third field should be compared as months. See the 'sort' man pages for an explanation.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2002 01:30 PM
09-25-2002 01:30 PM
Re: Sort question.
Wouldn't that only sort on Month and then the first field, which starts with IP address?
I've read the man on sort and didn't see any way to either sort on a column, or specify more than one deliminator, say "/" and "[".
Sean
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2002 01:38 PM
09-25-2002 01:38 PM
Re: Sort question.
Sorry, the month is the second field:
Yes, the first field really consists of the IP address through the day. The second field is a 3-character month (cleanly), and the third field is a year followed by a time. Ugly.
It would be helpful to re-format the data before sorting and/or to add a field start/field end to the keys specified.
The "key" [no pun intended, here] is the '-M' option to handle sorting months.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2002 03:02 PM
09-25-2002 03:02 PM
Re: Sort question.
That's the part that I couldn't figure out. I can't sort on field one, as that will put it in ip address order, when I need only the day portion of it. Ditto for field 3, where I only need the time portion of it.
Sean
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2002 03:06 PM
09-25-2002 03:06 PM
Re: Sort question.
Sean,
You should convert the date/time to the internal format and sort on that! The use of perl comes to mind here!
live free or die
harry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2002 03:19 PM
09-25-2002 03:19 PM
Re: Sort question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2002 03:23 PM
09-25-2002 03:23 PM
Re: Sort question.
Of course you are going to require that a "set" of transactions stay together??
What you need is a real web log analizer.
live free or die
harry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2002 03:39 PM
09-25-2002 03:39 PM
Re: Sort question.
I had to combine log files from about 10 sites into one file, which means that the records are out of order.
So I only need to sort the file on the date fields.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2002 03:41 PM
09-25-2002 03:41 PM
Re: Sort question.
Sean,
With perl you could convert the "external date" into the internal date and sort on that date, which would just be an index to the original record.
live free or die
harry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2002 10:47 PM
09-25-2002 10:47 PM
Re: Sort question.
I would try awk to reformat the file .
You can use the "blank" as Input Field Separator and as Output Filed separator the ";" { FS = " "; OFS = ";" }
This will deliver you as output :
66.196.65.25 - - [17/Sep/2002:09:00:57 -0500] "GET /robots.txt in the variables
$1 $2 $3 $4 $5 $6
$1 = 66.196.65.25
$2 = -
$3 = -
$4 = [17/Sep/2002:09:00:57
$5 = -0500]
$6 = "GET
$7 =
Next step is to split $4 into Date and time and store both values in variables
if the date is always in this format then its easy :
Date = substr( $4,2,11)
Bracket = substr ( $4,1,1)
Colon = substr ( $4,12,1 )
Time = substr ( $4,13,8)
outpuline is :
print $1,$2,$3,Bracket,Date,Colon,Time,$5,$6,$7
You get a line like this
66.196.65.25;-;-;[;17/Sep/2002;:;09:00:57;-0500;"GET;/robots.txt
Now you can sort on the fields you like.
Output this again into a file and run again awk to replace the ; by a
Now you should have a copy of your original file in the same format but sorted.
Following you will find the listing of both awk scripts needed :
script one used to create sortable fields
BEGIN { FS = " " ; OFS = ";" }
{
Date = substr( $4,2,11)
Bracket = substr ( $4,1,1)
Colon = substr ( $4,13,1 )
Time = substr ( $4,14,8)
print $1,$2,$3,Bracket,Date,Colon,Time,$5,$6,$7
}
#end of script
Script two used to remove ";" after sorting
BEGIN { FS = ";" ; OFS = " " }
{
print $1,$2,$3,$4 $5 $6 $7,$8,$9,$10
}
# $4 to $7 need to be placed WITHOUT a comma in between
# end of script
squence of commands :
awk -f <script one> logfile > logfilecopy1
sort
awk -f <script two> logfilecopy2 >logfilecopy2
forward file "logfilecopytwo to your weblog parser. Ithink this should work
Maybe it's not a nice way but it will do it. And there may also be shorter ways,. but as i'm not the big expert on the shell prompt i most often handle it in such sequences.
Hope it helps
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-26-2002 12:05 AM
09-26-2002 12:05 AM
Re: Sort question.
the safest way is probably to read the time specification in each line and then convert month names into numerical values. You should then apply time specs. as a "numerical weight (year, month, day, etc)" at the end of each line in a temporary file.
Example: your input:
66.196.65.25 - - [17/Sep/2002:09:00:57 -0500] "GET /robots.txt
Temporary output:
66.196.65.25 - - [17/Sep/2002:09:00:57 -0500] "GET /robots.txt # 20020917090057
Then sort the lines of the temporary file using the "numerical weght" as key, but write out only until the distinctive character.
The attached script can (perhaps) create the temporary file.
regards,
John K.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-26-2002 06:40 AM
09-26-2002 06:40 AM
Re: Sort question.
Am I right that sort won't let you sort on a column? That would seem like the most basic requirement of a sort util.
For example sort on chars in columns 5-10.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-26-2002 06:59 AM
09-26-2002 06:59 AM
SolutionYou can limit the sort key to character positions within the field. This is called a restricted sort key. For instance, if I had a file like:
a999bc
a123bc
a011bc
a099bc
a456bc
...I could sort on the first field using only positions (columns) two, three and four (the numeric part) with:
# sort sort -k1.2,1.4 filename
This is what I originally suggested when I said, "It would be helpful to re-format the data before sorting and/or to add a field start/field end to the keys specified".
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-26-2002 08:11 AM
09-26-2002 08:11 AM
Re: Sort question.
perl -ne 'if (/\[([^\]]+)\]) {@a=split("[:/]",$1);print $a[2]," ",$a[1]," ",@a[0,3,4,5]," ",$_;}' yourlog | sort -k1 -kM2 -k3 | cut -c19- >sortedlog
Hope this helps...
-- Rod Hills
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-26-2002 09:39 AM
09-26-2002 09:39 AM
Re: Sort question.
I used the [ as the delimiter and sorted on the second field. Fortunately for this file there was only 1 month so I didn't have to worry about getting the month order right. But for future times I'll have to deal with that.
Thanks for the help.
Sean