Re: Pattern matching issue...

jmckinzie · ‎09-03-2006

Ok,

Basically, I have a file that is 2.2 GB that is a log file and I have been requested to parse it...

Ok, no problem...just use grep....

ok, so I have lines in this file that look like this:

Thu May 9 20:29:02 2002: other stuff.....
Thu May 9 20:29:02 2005: other stuff.....
Thu May 9 20:29:02 2003: other stuff.....
Thu May 9 20:29:02 2006: other stuff.....

ok, I want to parse out everything that has 2002 at this particualr point in the line without reading any further....

Basically, if this says 2002 here, I want to get rid of it...if not, keep it...the problem I am running into is that if 2002 is ANYWHERE in the line, it shows up...

How do i grep for just this instance of 2002?

-TIA

A. Clay Stephenson · ‎09-03-2006

You want to use the enhanced version of grep -- which nowadays is simply the same grep but with the "-E" option. The difference is that it can do regular expression matches. It appears that we can take advantage of the HH:MM:SS YYYY to locate the string althogh we could look for it in explicit postions. I hesitate to suggest that because May 9 and May 10 may result in different offsets for the year position:

grep -E -e '.+ [0-9]{2}:[0-9]{2}:[0-9]{2} 2002'

That should do it for you.
Man grep for details.

If it ain't broke, I can fix that.

James R. Ferguson · ‎09-03-2006

Hi Jody:

You need 'awk' or Perl.

# awk '$5~/2002/ {print}' file

# perl -nae 'print if $F[4]=~/2002/' file

Notice that 'awk' counts the whitespace delimited fields from one (1) whereas Perl numbers from zero (0).

Regards!

...JRF...

Bill Hassell · ‎09-03-2006

grep has no concept of a field. That's what each element of the line is called. It's also why grep is such a lousy tool to find processes by name or userID. So is every line in the file formatted the same way? That is, 4 date elements then the year? This is a trick question since you may have to read the entire file to verify this condition. If so, you'll have to use a script something like this:

#!/usr/bin/sh
MYFILE=/var/adm/some_logfile
cat $MYFILE | read DAY MON NDAY TIME YEAR REST
do
[ "$YEAR" = "2002:" ] && echo $DAY $MON $NDAY $TIME $YEAR $REST
done

Notice that the read command is parsing each element based on spaces that separate the components. If there are multiple spaces, they won't be retained in the output. Note also the test must include the : because 2002: is not equal to 2002.

Bill Hassell, sysadmin

James R. Ferguson · ‎09-03-2006

Hi (again) Jody:

Oh, you said "get rid of it...":

# awk '$5!~/2002/ {print}' file

# perl -nae 'print unless $F[4]=~/2002/' file

Regards!

...JRF...

jmckinzie · ‎09-03-2006

James and Friends...

Is there a way to do a fast search on this exact file in perl that will look for the above...ie any entry that was done in 2005 and move it to another file called filename.2005?

I ask because i am using grep to do this now and it seems to be taking forever...but, this is a 2GB logfile.

Any ideas on how to spead this up?

Thanks again ..to everyone...

James R. Ferguson · ‎09-04-2006

Hi (again) Jody:

You asked, "Is there a way to do a fast search on this exact file in perl that will look for the above...ie any entry that was done in 2005 and move it to another file called filename.2005?"

I presume from this query that you want to extract all records with 2005 in the fourth field and place them into a file of their own, leaving a modified input file without them --- quickly in one pass. If that is correct, this small Perl script will do that:

# cat ./extract
#!/usr/bin/perl -i.old
my $name = shift or die "File expected\n";
open( FH, ">", "$name.2005" ) or die "Can't open: $!\n";
unshift @ARGV, $name;
while (<>) {
my @F = split;
if ( $F[4] =~ /2005/ ) {
print FH;
next;
}
else {
print;
}
}
close FH;

...run as:

# ./extract filename

...The original 'filename' will be preserved as 'filename.old'. When finished, 'filename' will be devoid of all records whose fourth (zero-relative), whitespace-delimited field matches "2005". These matching records will have been placed in a new file called 'filename.2005', instead.

Regards!

...JRF...

jmckinzie · ‎09-05-2006

THis worked perfectly...thanks..

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Pattern matching issue...

Pattern matching issue...