1752273 Members
4433 Online
108786 Solutions
New Discussion юеВ

Awk question?

 
SOLVED
Go to solution
Allanm
Super Advisor

Awk question?


I have the following entry in my web logs -

1.1.1.1 - - [17/Sep/2009:03:14:50 -0700] "GET /c14/v23.129/011157032/getlocalstring?template=calendar HTTP/1.1" 200 8941 "https://URL/c14/v28.129/011157032/reports/execute?rptid=011157032-VEND_BAL_DET-view-1253182486693" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 2.0.50727; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)" URL 80 - - - 0 "-" "1.1.1.1.232161235455457383"

I need to get the IP address(1.1.1.1-this can vary) and account id(011157032- this can vary) from each of the entries which are of this pattern within the web logs.

How do I get them?

Thanks,
Allan

10 REPLIES 10
Hein van den Heuvel
Honored Contributor
Solution

Re: Awk question?

You may want to consider the PERL regex over AWK.

Here is something that works with the example in a file 'test'.

$ perl -ne 'print qq(ip=$1 acc=$2\n) if m%^(\S+)\s.*?/(\d+)/\w+%' test
ip=1.1.1.1 acc=011157032

Now the string 1.1.1.1 appears twice. Which one is authorative?

I picked the fist one as being all the non-space starting at the begin of the line.

The Account number appears 3 times? Which one will always be there, which one has the most predictable 'surroundings'?

I picked the first one between two slashes and being followed by a word (getlocalstring in the example) followed by a ?

For a better solution, please provide a better problem description!

Regards,
Hein
James R. Ferguson
Acclaimed Contributor

Re: Awk question?

Hi Allan:

Perhaps:

# perl -nle '/((?:\d+\.?)+).+rptid=(\d+)/ and print join ":",$1,$2' file

1.1.1.1:011157032

Regards!

...JRF...
Matti_Kurkela
Honored Contributor

Re: Awk question?

While this is certainly possible with awk, this can be solved with a humble "cut" too.

"sed" would also work, but sed solutions all too easily become painfully Write-Only :-)

Is the account ID always prefixed by "/c14/v23.129/" or can this vary too? Since this looks like a version number, I'll assume the numbers might vary, but the number of path elements will stay the same.

If the prefix can vary, is it guaranteed that there request URI will have no spaces and a total of three slashes before the account ID? And that the account ID is always terminated with a slash?

If so, then pipe your log to this little script: (i.e. execute with "scriptname < weblog.log")

#!/bin/sh

while read logline
do
ipaddr=$(echo "$logline" | cut -d " " -f 1
requestURI=$(echo "$logline" | cut -d " " -f 7)
accountID=$(echo "$requestURI" | cut -d / -f 4)
echo "Account $accountID from IP $ipaddr"
done

MK
MK
Allanm
Super Advisor

Re: Awk question?

Thanks Hein!

Works even with \s.* while omitting the ?.


Thanks,
Allan
Allanm
Super Advisor

Re: Awk question?

JRF , can explain the regex ...

Thanks,
Allan.
James R. Ferguson
Acclaimed Contributor

Re: Awk question?

Hi (again) Allan"

> JRF , can explain the regex ...

# perl -nle '/((?:\d+\.?)+).+rptid=(\d+)/ and print join ":",$1,$2' file

Match and capture one or more digits (\d+) followed by an optional dot character (\.); repeating one more times; followed by any number of characters and the pattern "rptid=". The inner parentheses are non-capturing (?:) ones since we don't need them for anything other than a group repetition. Thus the address is captured in $1. Following the "rptid=" pattern we capture one or more digits that follow in $2.

If the whole pattern is satisfied, we print the pieces that were stored in $1 and $2, joining them together with a colon character.

Regards!

...JRF...
Dennis Handly
Acclaimed Contributor

Re: Awk question?

This looks like a continuation of your other threads:
http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1375401
James R. Ferguson
Acclaimed Contributor

Re: Awk question?

Hi (again) Allan:

So Dennis was correct, your requirements were better defined in you earlier thread :-) namely "The account id will always be the 3rd entry from /(slash) - with t15 being server-number, v33.109- code version and 670612718 being the account id."

Given this, I would offer:

# perl -nle 'm{^((?:\d+\.?){4}).+"GET\s+/.+?/.+?/(\d+)} and print join ":",$1,$2'

This also anchors the the isolation of the IPaddress to the beginning of each line.

Regards!

...JRF...
user57
Occasional Advisor

Re: Awk question?

For the given data:

$ awk -F '[ - / ]' '{print $1,$12}' wlog
1.1.1.1 011157032