1839148 Members
3035 Online
110136 Solutions
New Discussion

Re: script help ..

 
SOLVED
Go to solution
someone_4
Honored Contributor

script help ..

Hello everyone,
I have been asked to look though several logs and find out how many diffent usernames have been logging in our webserver from the same ip address.
Here is the format of the logs

xxx.yyy.zzz.252 - martila [02/Aug/2002:13:52:06 -0500] "GET / HTTP/1.0" 304 -
xxx.yyy.zzz.252 - - [02/Aug/2002:13:52:00 -0500] "GET / HTTP/1.0" 401 223
xxx.yyy.zzz.252 - delgads [02/Aug/2002:13:52:01 -0500] "GET / HTTP/1.0" 304 -
xxx.yyy.zzz.197 - - [02/Aug/2002:13:56:51 -0500] "GET / HTTP/1.0" 401 223


where the first colum is the ip address and martila and delgads are the usernames.
I need to be able to show something that will show me

xxx.yyy.zzz.252 martila
xxx.yyy.zzz.252 otheruser
xxx.yyy.zzz.252 otheruser1

and so on for all the of ips.
I tried to filter though them with awk but I am getting stuck on the compare because I will get a big list of the same user from the same ip.

Thanks
~ Richard

 

 

P.S. This thread has been movd from HP-UX > System Administration to HP-UX > languages - HP Forums Moderator

9 REPLIES 9
Jeffrey S. Sims
Trusted Contributor
Solution

Re: script help ..

Richard,

Have you tried using the uniq -c command to get your count? If you get the columns with awk that you want, sort them, and then use uniq -c that should be able to give you the numbers you want.

Example: If I have a file that contains the following contents:

10.10.10.1 alice
10.10.10.2 henry
10.10.10.2 henry
10.10.10.2 henry
10.10.10.3 james
10.10.10.4 larry

**notice it is already sorted

the run the following command:

#uniq -c testfile.txt (assuming testfile.txt is the name of the file you are checking for duplicates)

You will get an output similar to the following:

# uniq -c testfile.txt
1 10.10.10.1 alice
3 10.10.10.2 henry
1 10.10.10.3 james
1 10.10.10.4 larry

I think that addresses your question. Hope it helps.
harry d brown jr
Honored Contributor

Re: script help ..


Richard,

Are you able to "parse" out the IP's and usernames? Jeffery has a great hint on getting "counts".

Having been asked to do the same thing from my marketing group quite a while ago, I explained to these idiots (marketing people) (side note: marketing people are people that can't sell), that matching on IP's is equal to asking how many Susan's drive Camery's -> it's a useless meaningless statistic, because users can have the same IP. My IP on my cable modem is NOT the IP that is used when I visit a web site - it's proxied.

live free or die
harry
Live Free or Die
H.Merijn Brand (procura
Honored Contributor

Re: script help ..

perl -nle 'm/^([\d.]+)\s*-\s*(\w+)/&&$who{$1}{$2}++}END{for$ip(sort keys%who){for$n(sort keys%{$who{$ip}}){printf"%4d %16s %s",$who{$ip}{$n},$ip,$n' logfile

WARNING, not tested, just written down
Enjoy, Have FUN! H.Merijn
T G Manikandan
Honored Contributor

Re: script help ..

#cat log|awk -F "-" '{print $1,$2}'|awk -F " " '{print $1,$2}'|uniq


Thanks
Tim D Fulford
Honored Contributor

Re: script help ..

Richard

Some of the above answere may do what you want. But my understanding of the problem is that if 3 people log in from 193.164.1.23 you want to see

193.164.1.23 mary jane billy
193.164.1.252 matilda

Furthur to this, you probably do not care if mary logged in 20 times on 193.164.1.23.

That being the case perl would be the best tool for this, but I do not have my Unix computer to do the necessary (I would form a double hash table of the IP's and user names simply incrementing the variable e.g [$ip,$name]= [$ip,$name] + 1)

In ksh/awk I would do a double pass

awk '{print $1}' logfile | sort | uniq > IP-out
> results
for ip in $(cat IP-out)
do
awk '$1==IP{print $3}' IP=$ip logfile | sort | uniq > tmp-name
echo "$ip" >> results
cat tmp-name >> results
done

If you want to know how many times each user logged in on the IP use .....uniq -c > tmp-name

Regards

Tim
-
H.Merijn Brand (procura
Honored Contributor

Re: script help ..

perl -nle 'm/^([\d.]+)\s*-\s*(\w+)/&&$who{$1}{$2}++}END{for$ip(sort keys%who){printf"%16s ",$ip;for$n(sort keys%{$who{$ip}}){print" $n ($who{$ip}{$n})}}' logfile

To do it like

101.101.101.252 sandra (3) fernandus (8)

my first example also lacks some traling '}'s before the final closing quote. Sorry
Enjoy, Have FUN! H.Merijn
David Totsch
Valued Contributor

Re: script help ..

Richard:

Almost looks like you have started your own obfuscate Perl contest...

I still like awk(1), mostly because I have successfully avoided working with other platforms (read non-UNIX junk).

Anyhow, lets start with a simple count of how many times an IP appears, and grow from there:

/^[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*/ {IPS[$1]+=1}
END { for (i in IPS) print I,IPS[I]}

This demonstrates that we can use a string as an array reference. Now, lets shoot for a two-dimensional array. Before we begin, I plan on using two other arrays to keep track of the sub-scripts. This regular expression matches any line that starts with something that looks like an IP address. You may need to match on other data in the line to have hits only on lines for HTTP connections. I will also leave off the regular expression that finds an IP address at the head of the line for this next example.

{ DATA[$1","$2]+=1 ; IPS[$1]+=1 ; NAMES[$2]+=1 }
END {
for (I in IPS)
{
printf("%s ",I)
for (J in NAMES)
{
if (DATA[I","J] > 0)
printf("%s(%s),"I,DATA[I","J]
}
printf("\n")
}
}

In awk, an array reference never points to nothing, but if the reference doesn't exist, it is automagically null (zero).

-dlt-

Ralph Grothe
Honored Contributor

Re: script help ..

Hi Richard,

I could also contribute my humble version of access_log Perl parser, but refrain from because there is such abundance of ready log parsers available that I rather would like to point you to the excellent Perl CGI tutorial by Lincoln Stein (who is the author of the Perl CGI.pm module:

http://stein.cshl.org/~lstein/talks/perl/perl98.html#Log_Parsing


n.b. Procura's one-liner indeed could scare off the casual Perl scriptor, and looks a bit like a candidate for the renown "Obfuscated Perl Contest".
If you are interested in some Perl obfuscation, please have a look at this:

http://www.sysadminmag.com/tpj/obfuscated/

Regards
Ralph
Madness, thy name is system administration
H.Merijn Brand (procura
Honored Contributor

Re: script help ..

But in fact it isn't obfuscated at all, just condensed. Let's split it up with some comments. Whitespace is dropped by the forum, so indent and lineouts are void


perl -nle '
# the -n option places the complete -e
# block in between 'while (<>) { ..-e block ..}'
m/^([\d.]+) # match ip
\s* - \s* # mach separator
(\w+) # match user
/x
&& # if matched
$who{$1}{$2}++; # count in hash of hahses $1 = ip, $2 = user

}
# at the end of processing ...

END{
for $ip (sort keys%who) { # iterate over sorted IP's
printf"%16s ",$ip; # and print it aligned
for $n (sort keys%{$who{$ip}}) { # iterate over the names for this IP
print" $n ($who{$ip}{$n})"; # and print with count
}
}
' logfile


Along the way I noticed a missing double quote for the final print. Sorry.
Enjoy, Have FUN! H.Merijn