topic Re: Scripting Query in Operating System - Linux

Scripting Query

Duffs — Mon, 17 Jan 2011 10:01:48 GMT

Hi,

I am trying to find out the IP address that has got written most frequently to my access_log file. The file is a few thousand lines and therefore it is not practical to view this manually. I have tried to select the first column in the access_file and print it, sort it and simply eyeball the result but with so many different addresses there must be a more solid way of doing it?

i.e.
# cat access_log | awk '{print $1}' | sort -rn > /tmp/access.txt

And this leads on to my next question which is if I was looking for the most frequent entry in a file but not necessarily an IP address how could I get it if it wasn't delimited by colums and could be of any alphanumerical value?

Regards,
D.

Re: Scripting Query

Goran Koruga — Mon, 17 Jan 2011 10:39:55 GMT

Hello.

Write a trivial awk or perl script using hash arrays.

Regards,
Goran

Re: Scripting Query

Dennis Handly — Tue, 18 Jan 2011 00:29:21 GMT

>I am trying to find out the IP address that has got written most frequently

Try:
awk '{print $1}' access_log | sort | uniq -c | sort -rn > /tmp/access.txt

>if it wasn't delimited by columns and could be of any alphanumerical value?

Are you trying to find the most frequently occurring "word" in a file?
You could use tr(1) to convert your separators to a newline then use the above sort/uniq/sort pipeline. (Removing blank lines first.)

Re: Scripting Query

Matt Palmer_2 — Tue, 18 Jan 2011 07:30:14 GMT

Hi,

you could use 'webalizer' to parse the access_log for you, as it does all the sorting on your behalf, then use curl to pull back the stats page.

regards

Matt

Re: Scripting Query

Duffs — Wed, 19 Jan 2011 09:56:58 GMT

Hi,

Denis, yes I am trying to find the most frequently used word in a file. Are there any alternatives to 'tr' or is this the only way of doing it? What would the command look like to obtain this.

R,
D.

Re: Scripting Query

Duffs — Wed, 19 Jan 2011 16:43:18 GMT

Hi,

Thanks for the feedback! Spot on Dennis with the tr command for the non delimited file also.

Using 'tr' I replaced the spaces with a new line and redirected the output to a new file then used the sort/uniq/sort pipeline as suggested and it produces the desired result.

Regards,
D.