Operating System - HP-UX
1833875 Members
1775 Online
110063 Solutions
New Discussion

Re: Scripting: Parsing HUGE text/log files

 
SOLVED
Go to solution
Mike_316
Frequent Advisor

Scripting: Parsing HUGE text/log files

Hey Gang,

I am trying to parse data out of some HUGE log files. The problem is I would normally just cat the file, grep'ping or "awk'ing" for the data. For example, in this case, I am only wanting to see the line of data which is actually related to a file (the output is basically an "ls -la" output of backed up data)-so I would normally do something along the lines of...

cat | while read LINE
do
if [ `echo $line | cut -c1` = "-" ]
then; echo $line
fi
done

BUT, the logfile is so big (200+ MB) that the "cat" chokes and never drops any data to the while statement. I have tried using "more" instead of "cat", and get better results with some of the smaller log files...but it still can't work with the big ones.


I have no control over how the log files are output...they are being generated by a different process, beyond my modification.

Any suggestions??

Thanks!

Mike
"If we treated each person we met as if they were carrying an unspeakable burden, we might treat each other as we should" - Dale Carnegie
6 REPLIES 6
curt larson_1
Honored Contributor
Solution

Re: Scripting: Parsing HUGE text/log files

you could try

sed -e '/^-/p'
or
awk '/^-/ {print;}'
RAC_1
Honored Contributor

Re: Scripting: Parsing HUGE text/log files

grep '^-' input_file
sed -e '/^-/p' input_file

Anil
There is no substitute to HARDWORK
John Poff
Honored Contributor

Re: Scripting: Parsing HUGE text/log files

Hi,

One way to do it with Perl:

perl -ne 'print if /^-/' file

JP
Bill Hassell
Honored Contributor

Re: Scripting: Parsing HUGE text/log files

cat will have no problem with gigabyte files files so 200megs is trivial. Is the cat version actually reading the file? I'm not clear whay this would not work:

grep ^-

In your example script, you've misspelled LINE so a working version would be:

set -u
cat | while read LINE
do
if [ $(echo "$LINE" | cut -c1) = "-" ]
then
echo "$LINE"
fi
done

To prevent similar spelling errors, always use set -u. Also, use "$LINE" to preserve imbedded spaces. The use of grave accents has been depracated for almost a decade now. Use $(...) rather than `...` (see man pages for sh-posix and ksh). The above script works but is about 1/10 the speed of the grep line. If you only want the first occurance of "-" in the file, use this:

grep ^- | head -1


Bill Hassell, sysadmin
Mike_316
Frequent Advisor

Re: Scripting: Parsing HUGE text/log files

THAT WORKS! Thanks! Basically, I used the sed -e '/^-/p' and then piped the result into the "while read line". I am getting the data I need!

Thanks!
Mike
"If we treated each person we met as if they were carrying an unspeakable burden, we might treat each other as we should" - Dale Carnegie
Mike_316
Frequent Advisor

Re: Scripting: Parsing HUGE text/log files

Hey Bill,

Thanks for the spellcheck :-) I am not sure what was happening with "cat" and "more" either. Very simply, if I ran a "cat" against the 200MG file (just "cat ") I got output to the screen, and if I ran the above script (with the corrected spelling) against a smaller logfile, it would work...but against the larger file, it just sat there.

If I did a "ps" against the "while" or the "read" (to see if "cat" had started piping data to those commands) I got nothing when running the script against the larger log file, where it did show a "while read" process running when using the script against smaller logfiles.

It was as if the "cat"/"more" commands had to parse ALL the data into memory, before it started dumping it to the "while read line".

Not sure why...but the "sed | while read" is working where the "cat | while read" and "more | while read" were choking. Must be the way in which the two commands handle the data...?

Thanks again!
Mike

P.S. Confession timeâ ¦this script will be running on both HPUX and Solaris, and it is on the Solaris box where the script is choking (where I am testing it)â ¦but SUNâ s support forums are so crappy, that I posted it here first in order to actually get a response. I wonder if SUNâ s and HPâ s more/cat commands handle their memory d
"If we treated each person we met as if they were carrying an unspeakable burden, we might treat each other as we should" - Dale Carnegie