Operating System - HP-UX
1827245 Members
2206 Online
109716 Solutions
New Discussion

Number of unique users between a timeframe

 
SOLVED
Go to solution
Allanm
Super Advisor

Number of unique users between a timeframe


Hi,

I have need to find out the number of unique users between a certain timeframe using the apache web logs. I am posting a snippet of the logs here -

76.14.65.x - - [22/Jan/2010:00:05:23 -0800] "GET /f22/v31.121/images/b3_save_over.gif HTTP/1.1" 200 622 "https://www.com/c22/v31.200.1264147317953/474936101/bill/new?_refresh_=t"

70.234.142.x - - [22/Jan/2010:00:05:23 -0800] "GET /f23/v31.121/images/upgrade_menu_bg.gif HTTP/1.1" 200 1099 "https://www.com/c23/v31.200.1264147488555/065906351/check/new?_qbo_refresh_=t" "

76.173.1.x - - [22/Jan/2010:00:05:23 -0800] "GET /f15/v31.200/810464115/salestxn/deliverconfirm?dtype=3&ttype=35&addr=markl714%40yahoo.com&clean=true&mas_process_checkbox=false&mas_issaved=false HTTP/1.1"

70.234.142.x - - [22/Jan/2010:00:05:23 -0800] "GET /f23/v31.200.1264147518320/065906351/find/recent?type=3 HTTP/1.1" 200

Basically I want get a count of unique IP addresses in a given timeframe. I can get the list of unique IPs from the logs but not able to get that between a given timeframe.

Can someone suggest a solution that can work for this.

Thanks,
Allan.
5 REPLIES 5
James R. Ferguson
Acclaimed Contributor
Solution

Re: Number of unique users between a timeframe

Hi Allan"

> Basically I want get a count of unique IP addresses in a given timeframe. I can get the list of unique IPs from the logs but not able to get that between a given timeframe.

Use a regular expression or a range matching a beginning and ending pattern. This can be done with 'sed', 'awk' or 'perl'.

For example:

# awk 'NF==0 {next};/22\/Jan/ {print $1}' file|sort|uniq -c

# awk 'NF==0 {next};/21\/Jan/,/23\Jan/ {print $1}' file|sort|uniq -c

In both examples, blank lines are excluded, too.

Regards!

...JRF...

VK2COT
Honored Contributor

Re: Number of unique users between a timeframe

Hello,

James provided you with simple one-liners.
Efficient and easy.

However, his code, like mine below has one problem.

Here is the explanation:

# awk 'NF==0 {next};/21\/Jan/,/23\Jan/ {print $1}' file|sort|uniq -c

This will extract all lines that contain
string "21/Jan" and THE FIRST OCCURRENCE of
"23/Jan". In other words, if your log file
contains end-date "23/Jan", you will get
an extra record in the final count.
So, just be mindful what you search for.
Otherwise, if you want powerful search,
language like Perl would be more useful
option.

Anyway, just for fun, I wrote a longer
version of the Shell script that has some basic checks of dates and other options.

When it is run, it looks like this:

./extract-timeframe.sh -s 22/Jan/2010 -e 25/Jan/2010
COUNT IP-ADDRESS
4 70.234.142.x
1 76.14.65.x
1 76.173.1.x

#!/bin/sh
#
# Example written by Dusan U. Baljevic
#
# Set path to reasonable environment
#
PATH=/usr/sbin:/sbin:/usr/bin:/bin; export PATH
MYLOG="weblog"

umask 022

helpfile () {
echo "USAGE: $MYCOMM [-h] [-s startdate] [-e enddate]
-h Help menu
-e enddate End date
-s startdate Start date

EXAMPLE: $0 -s 22/Jan/2010 -e 25/Jan/2010
"
}

mycalc () {
DAY="`echo $1 | awk -F/ '{print $1}'`"
MONTH="`echo $1 | awk -F/ '{print $2}'`"
YEAR="`echo $1 | awk -F/ '{print $3}'`"
DLEN=`expr length $DAY`
MLEN=`expr length $MONTH`
YLEN=`expr length $YEAR`

if [ "$DLEN" -gt 2 ]
then
echo "ERROR: Variable $DAY in string should have two characters maximum"
helpfile
exit 1
fi

if [ "$MLEN" -ne 3 ]
then
echo "ERROR: Variable $MONTH in string should have three characters only"
helpfile
exit 1
fi

if [ "$YLEN" -ne 4 ]
then
echo "ERROR: Variable $YEAR in string should have four characters only"
helpfile
exit 1
fi

if [ ! "`echo $DAY | grep \"^[0-9]*$\"`" ]
then
echo "ERROR: Variable $DAY for string day not numerical"
helpfile
exit 1
fi

case "$MONTH" in
Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) ;;
*) echo "ERROR: Variable $MONTH not month name"
helpfile
exit 1
;;
esac

if [ ! "`echo $YEAR | grep \"^[0-9]*$\"`" ]
then
echo "ERROR: Variable $YEAR for string year not numerical"
helpfile
exit 1
fi
}

while getopts he:s: option
do
case $option in
e) ENDDATE="`echo ${OPTARG} | sed -e 's#/#\\\/#g'`"
mycalc $OPTARG
;;
s) STARTDATE="`echo ${OPTARG} | sed -e 's#/#\\\/#g'`"
mycalc $OPTARG
if [ ! "`grep $OPTARG $MYLOG`" ]
then
echo "ERROR: Start date $OPTARG not listed in $MYLOG"
helpfile
exit 1
fi
;;
h) helpfile
exit 0
;;
*) helpfile
exit 1
;;
esac
done


if [ "$STARTDATE" -a "$ENDDATE" ]
then
echo " COUNT IP-ADDRESS"
awk '/'$STARTDATE'/,/'$ENDDATE'/ { if (NF > 0) { print $1; }}' $MYLOG | sort | uniq -c
else
echo "ERROR: You did not provide start and end dates"
helpfile
exit 1
fi

exit 0


Cheers,

VK2COT
VK2COT - Dusan Baljevic
James R. Ferguson
Acclaimed Contributor

Re: Number of unique users between a timeframe

Hi (again) Allan:

As VK2COT correctly points out, the solution I offered extracts a range of lines from the starting through the ending pattern.

If that's not acceptable, a small tweak can make the extraction stop and skip the matched ending pattern.

For example, if you only wanted records from "21/Jan" through "22/Jan" you could rewrite the range extraction like:

# awk 'NF==0 {next};/21\/Jan/,/23\Jan/ {x=1};{if (/23\/Jan/) {x=0};if (x) {print $1}}' file|sort|uniq -c

Regards!

...JRF...
Allanm
Super Advisor

Re: Number of unique users between a timeframe

Hey Guys, thanks for the wonderful replies, most of the situations I face is based on timeframes within the day.

e.g. on 21st Jan from 11:21am - 12:20pm.

Can this script get me number of users logged in between these timeframes (not in particular) but as an example.

Thanks,
Allan.
James R. Ferguson
Acclaimed Contributor

Re: Number of unique users between a timeframe

Hi (again) Allan:

> Can this script get me number of users logged in between these timeframes (not in particular) but as an example.

Yes. and in the event that we need to build more onto this, let's use Perl:

# perl -nale 'next if m{^\s*$};print $F[0] if m{21/Jan/2010:11:21}..m{21/Jan/2010:12:21}' file|sort|uniq -c

Once again, blank lines are skipped.

Regards!

...JRF...