1748252 Members
3925 Online
108760 Solutions
New Discussion юеВ

Geo Lookup

 
Allanm
Super Advisor

Geo Lookup


This thread is a successor of -

http://forums13.itrc.hp.com/service/forums/questionanswer.do?threadId=1373850

I found this nice little utility called geoiplookup which maps the IP addresses to country/country codes.

After fraudulent attack from country X on our website, we thought of having a way to monitor user pattern to see if a user is consistently logging in from country A and suddenly shows up in country B, it should alert us and we should monitor for any fraudulent activity on that account.

For this I want a script that gets all the IP addresses for that day and associated account id from the web logs and updates a master file with that information.

This script should also have the ability to alert us (through mail) once it puts in an entry that is duplicate in terms of an account id but has a different country code associated with it.

Here is how our web logs look like -

- - [20/Jul/2009:00:09:40 -0700] "GET /t15/v33.109/670612718/lists=G HTTP/1.1" 200 2 1949 "https:///t15/v33.109/670612718/set" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB6; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506; InfoPath.2)" 91 - - - 1 "-" ".115061239639 08475"

where IP is the first field and 670612718 is the account id that I need to update into the Master file.

Can someone help in scripting this for me.It's easier for me to get the IP but I would like to get the account id which I am not able to get consistently given the nature of the logs.
And on how to fulfill the condition about alerting us.
6 REPLIES 6
Allanm
Super Advisor

Re: Geo Lookup

The master file should have the following columns -

Col1 | Col2
Account ID | Country Code

The script gets a list of IP's(& account id) , does geoiplookup and gets the associated country code and updates the file, but alerts us if the account id already exists in the file but is for a diff country(code).
Allanm
Super Advisor

Re: Geo Lookup

The web logs are gzipped -

webN.access_log.20090929.gz

And there are 8 web servers in total and want this to run on a central log server which has all the web access_logs from the previous day.


Allanm
Super Advisor

Re: Geo Lookup

Would like to have the IP address in the master file since there is a good chance that IP gets assigned to a different country and can be a false positive.

So the master file should have -

IP | Account ID| Country Code




Allanm
Super Advisor

Re: Geo Lookup

Please help folks!

Thanks,
Allan
Dennis Handly
Acclaimed Contributor

Re: Geo Lookup

>to get the account id which I am not able to get consistently given the nature of the logs.

If they aren't consistent, how are we going to figure this out? What's the pattern?

Would like to have the IP address in the master file ...
>So the master file should have:
>IP | Account ID | Country Code

You need to make sure of your requirements. Then you can decompose your scripting tasks.
Are you assuming you will start out with an initially empty "master" file?

>The web logs are gzipped

So, you need one part to read the logs and extract the IP and account ID.
Another part would be to look up in the master file.
Another part to do geoiplookup.

How big will this be? If you have 10s of thousands of records, you need some type of quick lookup, a database.

>And there are 8 web servers ... from the previous day.

I assume this isn't real time and you can do each log one at a time.
Allanm
Super Advisor

Re: Geo Lookup

Thanks for replying Dennis -

ANSWERS BELOW

>to get the account id which I am not able to get consistently given the nature of the logs.
If they aren't consistent, how are we going to figure this out? What's the pattern?

I meant that there are some entries which dont have the accountids in them, and I need to parse out the lines which have those ids.
The lines which have those ids, has a certain pattern as shown/explained below -

- - [20/Jul/2009:00:09:40 -0700] "GET /t15/v33.109/670612718/lists=G HTTP/1.1" 200 2 1949 "https:///t15/v33.109/670612718/set" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB6; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506; InfoPath.2)" 91 - - - 1 "-" ".115061239639 08475"

The account id will always be the 3rd entry from /(slash) - with t15 being server-number, v33.109- code version and 670612718 being the account id.

/t15/v33.109/670612718


Would like to have the IP address in the master file ...
>So the master file should have:
>IP | Account ID | Country Code

You need to make sure of your requirements. Then you can decompose your scripting tasks.
Are you assuming you will start out with an initially empty "master" file? YES

>The web logs are gzipped

So, you need one part to read the logs and extract the IP and account ID.
Another part to do geoiplookup.
Another part would be to look up in the master file. AND NOT POPULATE IF THE ENTIES ARE SIMILAR AND ALERT IF THE ACCOUNT ID EXISTS BUT IS FOR A DIFFERENT COUNTRY CODE SO THE GEO IP LOOKUP HAPPENS BEFORE THIS STEP.


How big will this be? If you have 10s of thousands of records, you need some type of quick lookup, a database.- We want to limit this to a TEXT file. If its easier for you to use the DB then I will let you handle that way.

>And there are 8 web servers ... from the previous day.

I assume this isn't real time and you can do each log one at a time. YES , this will happen only when the logs for that day have been downloaded on a central log server.