Operating System - HP-UX
1822036 Members
3450 Online
109639 Solutions
New Discussion юеВ

New to scripting, grateful for any help !!

 
SOLVED
Go to solution
Jeremy W
New Member

New to scripting, grateful for any help !!

Hi all,
I've been asked to create a script that will look for a particular string in a particular directory which contains anything upto 15000 files.
On line 12 of every file within the directory there is a possibility of the work 'ABORT', any other occurences of the word abort withing the files I need to ignore, only the one on line 12 is important.
ABORT is the string I need to look for and I would like to poll the directory every 5 mins (presumably via a cron job), and look at the new files that have been created.

Grateful for any help.
10 REPLIES 10
OldSchool
Honored Contributor

Re: New to scripting, grateful for any help !!

"ABORT is the string I need to look for and I would like to poll the directory every 5 mins (presumably via a cron job), and look at the new files that have been created."

dumb question....do you only need to examine the files create since the previous run, or should all the files be examined? (I'd assume the former, as you've already looked at the existing files, but if they can be overwritten, then they would be candidates as well...)
Michael Steele_2
Honored Contributor

Re: New to scripting, grateful for any help !!




ls /dir > WORK_FILE (* get ALL files *)
cat $WORK_FILE | while read a (* read a file *)
do
i=0
cat $a | while read b (* read a line *)
do
if [[ $i -eq 12 ]] (* if 12th line *)
then
grep ABORT $b (* check for ABORT *)
if [[ $? -eq 0 ]]
then
TRUE=0
fi
break
fi
i=(($i+1))
done
done
Support Fatherhood - Stop Family Law
James R. Ferguson
Acclaimed Contributor

Re: New to scripting, grateful for any help !!

Hi Jeremy:

Here's a suggestion:

# cat ./watch
#!/usr/bin/sh
REFFILE=/tmp/REF.$$
touch ${REFFILE}
while true
do
ls | while read FILE
do
[ ${FILE} -nt ${REFFILE} ] || continue
awk 'NR==12 && /ABORT/ {print FILENAME":"$0;exit 0}' ${FILE}
done
touch ${REFFILE}
sleep 600
done

Change to the directory of interest. The script assumes that only text files are present (no other directories). This is easily rectified if necessasry.

The script runs infinitely until killed. It awakens every 5-minutes (600 seconds) to look for the string "ABORT" on line-12 of files that have been modified more recently than 5-minutes ago. If it finds a match to this criteria, the script prints the filename along with the contents of the 12th line.

Regards!

...JRF...
Dennis Handly
Acclaimed Contributor

Re: New to scripting, grateful for any help !!

>On line 12 of every file within the directory there is a possibility of the word 'ABORT',

Once a new file is found (see below), you can scan it with awk:
awk '
NR > 12 { exit }
/ABORT/ {
if (NR < 12) next
print $0
}' file

>OldSchool: do you only need to examine the files created since the previous run, or should all the files be examined?

Kind of hard to find created files unless you keep tract of all 15,000 files. Unless you look at the inode change time. This would probably be faster than looking at the modification time, if files are continually being appended.

This seems like it needs a reference file and use find with -newer.

>JRF: awk 'NR==12 && /ABORT/ {print FILENAME":"$0;exit 0}' ${FILE}

(Before I submitted, I noticed your solution.)
It seems you have the same idea, unfortunately you'll need to break out of awk if NR > 12, or you'll be wasting time scanning large files.
James R. Ferguson
Acclaimed Contributor

Re: New to scripting, grateful for any help !!

Hi (again):

> Dennis: (Before I submitted, I noticed your solution.) It seems you have the same idea, unfortunately you'll need to break out of awk if NR > 12, or you'll be wasting time scanning large files.

Yes, indeed, you're right and I meant to add that akin to the exit I did upon matching. We could change:

awk 'NR==12 && /ABORT/ {print FILENAME":"$0;exit 0}' ${FILE}

...to:

awk 'NR>12 {exit 0};NR==12 && /ABORT/ {print FILENAME":"$0;exit 0}' ${FILE}

Regards!

...JRF...
OldSchool
Honored Contributor

Re: New to scripting, grateful for any help !!

Dennis: This would probably be faster than looking at the modification time, if files are continually being appended.

Right, but the problem to be solved has that one ill-defined item. I suspect that having "looked" at at file in one pass, he may never need to "look" at it again. Other possibilites include moving all of the files examined in one pass to a different directory after examination, keeping a listing from the previous run and removing removing those entries from the current run with some combination of sort and diff....I'm sure their are other ways as well..
Jeremy W
New Member

Re: New to scripting, grateful for any help !!

Many thanks for your replies. Dennis is correct, once a file has been checked, there isn't a need to check that file again. Files only need to be check once each. If the string is found the script needs to exit that time, but to continue to run again after another 5 minute pause, (if that's possible).

Is it possible to check for new files without creating reference files, as they would require housekeeping as well.

Apologies for what may seem basic questions, but very very new to all this !!!

Jeremy
James R. Ferguson
Acclaimed Contributor
Solution

Re: New to scripting, grateful for any help !!

Hi (again) Jeremy:

> ...once a file has been checked, there isn't a need to check that file again. Files only need to be check once each. If the string is found the script needs to exit that time, but to continue to run again after another 5 minute pause, (if that's possible).

That's the essence of the outer loop (while true...). This loop ends with a 5-minute sleep before running again.

Notice the line:

[ ${FILE} -nt ${REFFILE} ] || continue

This says to compare the modification time of the file to a reference file and if the file hasn't been modified during the last interval, continue with the next file in the inner loop.

> Is it possible to check for new files without creating reference files, as they would require housekeeping as well.

There's only one reference file created and we can easily remove it when done with a 'trap' statement. In all the modified script looks like this:

# cat ./watch
#!/usr/bin/sh
REFFILE=/tmp/REF.$$
trap 'rm -f ${REFFILE}' EXIT
touch ${REFFILE}
while true
do
ls | while read FILE
do
[ ${FILE} -nt ${REFFILE} ] || continue
awk 'NR>12 {exit 0};NR==12 && /ABORT/ {print FILENAME":"$0;exit 0}' ${FILE}
done
touch ${REFFILE}
sleep 600
done
exit 0

Regards!

...JRF...
Jeremy W
New Member

Re: New to scripting, grateful for any help !!

Hi James,

Thanks for your help, this may be a stupid question, but I'm assuming the REF file should contain a listing of what files have been read/checked. My question is, where/how do I define the directory that I need to check where all the files are held ?

Thanks
Jeremy
James R. Ferguson
Acclaimed Contributor

Re: New to scripting, grateful for any help !!

Hi (again) Jeremy:

> ...this may be a stupid question, but I'm assuming the REF file should contain a listing of what files have been read/checked. My question is, where/how do I define the directory that I need to check where all the files are held ?

No, the 'REFFILE' is only used to create a mark-in-time (for you, every 5-minutes). We use this to decide if a file is newer (more recently changed) since the last mark.

If at sometime, t[n], a file meets the principal criteria of having the string "ABORT" somewhere on line-12, then we report the event. Then, at some later time, t[n+m] another change to the file is made. This triggers a re-examination of the file and a second report for the file.

If this is _not_ what you want, there are ways to circumvent that. You could filter the output through a 'sort -u' to reduce the results to unique entries.

You could also store the name of any file meeting the match criteria in a hash (associative array in 'awk' parlance). Then, skip repetitive output for any file once added to the hash.

Regards!

...JRF...