Operating System - HP-UX
1829141 Members
7084 Online
109986 Solutions
New Discussion

Re: Pattern matching issue...

 
SOLVED
Go to solution
jmckinzie
Super Advisor

Pattern matching issue...

Ok,

Basically, I have a file that is 2.2 GB that is a log file and I have been requested to parse it...

Ok, no problem...just use grep....

ok, so I have lines in this file that look like this:

Thu May 9 20:29:02 2002: other stuff.....
Thu May 9 20:29:02 2005: other stuff.....
Thu May 9 20:29:02 2003: other stuff.....
Thu May 9 20:29:02 2006: other stuff.....

ok, I want to parse out everything that has 2002 at this particualr point in the line without reading any further....

Basically, if this says 2002 here, I want to get rid of it...if not, keep it...the problem I am running into is that if 2002 is ANYWHERE in the line, it shows up...

How do i grep for just this instance of 2002?

-TIA
7 REPLIES 7
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Pattern matching issue...

You want to use the enhanced version of grep -- which nowadays is simply the same grep but with the "-E" option. The difference is that it can do regular expression matches. It appears that we can take advantage of the HH:MM:SS YYYY to locate the string althogh we could look for it in explicit postions. I hesitate to suggest that because May 9 and May 10 may result in different offsets for the year position:

grep -E -e '.+ [0-9]{2}:[0-9]{2}:[0-9]{2} 2002'

That should do it for you.
Man grep for details.
If it ain't broke, I can fix that.
James R. Ferguson
Acclaimed Contributor

Re: Pattern matching issue...

Hi Jody:

You need 'awk' or Perl.

# awk '$5~/2002/ {print}' file

# perl -nae 'print if $F[4]=~/2002/' file

Notice that 'awk' counts the whitespace delimited fields from one (1) whereas Perl numbers from zero (0).

Regards!

...JRF...
Bill Hassell
Honored Contributor

Re: Pattern matching issue...

grep has no concept of a field. That's what each element of the line is called. It's also why grep is such a lousy tool to find processes by name or userID. So is every line in the file formatted the same way? That is, 4 date elements then the year? This is a trick question since you may have to read the entire file to verify this condition. If so, you'll have to use a script something like this:

#!/usr/bin/sh
MYFILE=/var/adm/some_logfile
cat $MYFILE | read DAY MON NDAY TIME YEAR REST
do
[ "$YEAR" = "2002:" ] && echo $DAY $MON $NDAY $TIME $YEAR $REST
done

Notice that the read command is parsing each element based on spaces that separate the components. If there are multiple spaces, they won't be retained in the output. Note also the test must include the : because 2002: is not equal to 2002.


Bill Hassell, sysadmin
James R. Ferguson
Acclaimed Contributor

Re: Pattern matching issue...

Hi (again) Jody:

Oh, you said "get rid of it...":

# awk '$5!~/2002/ {print}' file

# perl -nae 'print unless $F[4]=~/2002/' file

Regards!

...JRF...
jmckinzie
Super Advisor

Re: Pattern matching issue...

James and Friends...

Is there a way to do a fast search on this exact file in perl that will look for the above...ie any entry that was done in 2005 and move it to another file called filename.2005?

I ask because i am using grep to do this now and it seems to be taking forever...but, this is a 2GB logfile.

Any ideas on how to spead this up?

Thanks again ..to everyone...
James R. Ferguson
Acclaimed Contributor

Re: Pattern matching issue...

Hi (again) Jody:

You asked, "Is there a way to do a fast search on this exact file in perl that will look for the above...ie any entry that was done in 2005 and move it to another file called filename.2005?"

I presume from this query that you want to extract all records with 2005 in the fourth field and place them into a file of their own, leaving a modified input file without them --- quickly in one pass. If that is correct, this small Perl script will do that:

# cat ./extract
#!/usr/bin/perl -i.old
my $name = shift or die "File expected\n";
open( FH, ">", "$name.2005" ) or die "Can't open: $!\n";
unshift @ARGV, $name;
while (<>) {
my @F = split;
if ( $F[4] =~ /2005/ ) {
print FH;
next;
}
else {
print;
}
}
close FH;

...run as:

# ./extract filename

...The original 'filename' will be preserved as 'filename.old'. When finished, 'filename' will be devoid of all records whose fourth (zero-relative), whitespace-delimited field matches "2005". These matching records will have been placed in a new file called 'filename.2005', instead.

Regards!

...JRF...
jmckinzie
Super Advisor

Re: Pattern matching issue...

THis worked perfectly...thanks..