Operating System - HP-UX
1753767 Members
5838 Online
108799 Solutions
New Discussion юеВ

Re: process monitoring script help...

 
SOLVED
Go to solution
sekar sundaram
Honored Contributor

process monitoring script help...

Hi,

we are already doing process monitoring with a simple script:
for i in `cat process.list`
do
Count=`ps -ef | grep -i $i | grep -v grep | wc -l`
if count equal to zero
mail a b c

but it got two interesting issues:
1. the process list is big one...so if ten processes are down, we get ten mails.
i am thinking that we should get only one mail with all process that are not running.
2. after a process went down, when it comes back, we should get an email saying process is running fine. we may get process down alerts many times but "process running" should be only once -ie, after process comes up from process down.

since its a bit complex scripting issue, i would like ask ur ideas...

thanks many...
Regards,
Sekar
21 REPLIES 21
James R. Ferguson
Acclaimed Contributor

Re: process monitoring script help...

HI:

> i am thinking that we should get only one mail with all process that are not running.

Then as you loop through the list of processes that you want to check, capture the names of the ones that aren't running in a variable of your choice and reference that in a email. If the variable is empty; then there's no mail to send.

Since you appear to be looking for process's by name, don't 'grep' but use the UNIX95 (XPG4) behavior to specifically find a process by name. Hence, for your driving loop you might do:

DOWN=""
for PROC in $(do
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] && DOWN=$(echo ${DOWN} ${PROC})
done
[ ! -z "${DOWN}" ] && mailx -s "${DOWN} processes are not running" sekar@xyz.com < /dev/null

...

> after a process went down, when it comes back, we should get an email saying process is running fine. we may get process down alerts many times but "process running" should be only once -ie, after process comes up from process down.

For each process you will need to record its name and a "down" state indication. When your script runs it needs to send an email if the process *is* running but the previous state was "down". You could record this state information in a file or in memory in your script if your script constantly runs. For example, you could launch your script once with an other loop that looks like:

while true;
do
...
sleep 60
done

Regards!

...JRF...
sekar sundaram
Honored Contributor

Re: process monitoring script help...

thanks James,

i got -syntax error at line 16: `DOWN=$' unexpected
then i used the old `cat filename` method.

now i am getting - syntax error at line 16: `DOWN=$' unexpected

Steven Schweda
Honored Contributor

Re: process monitoring script help...

> i got -syntax error at line 16: [...]

How many people, do you think, know what's on
line 16 of your script?
Dennis Handly
Acclaimed Contributor

Re: process monitoring script help...

>I used the old `cat filename` method.

No need to use that evil cat, the $(< X) form isn't causing the error.

>syntax error at line 16: `DOWN=$' unexpected

Did you split both JRF's lines starting with "[" in two?
This should be one line:
[ -z "$(UNIX95=EXTENDED_PS ps -C ${PROC} -opid=)" ] && DOWN="${DOWN} ${PROC}"
sekar sundaram
Honored Contributor

Re: process monitoring script help...

Thanks Dennis,
previously i used JRF's:
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] && DOWN=$(echo ${DOWN} ${PROC})

now i used yours,
[ -z "$(UNIX95=EXTENDED_PS ps -C ${PROC} -opid=)" ] && DOWN="${DOWN} ${PROC}"
and as you said, i am using
for PROC in $(
but it give again the same error
syntax error at line 13: `$' unexpected
line 13 is this - for PROC in $(
sekar sundaram
Honored Contributor

Re: process monitoring script help...

Steven, since `DOWN=$' comes only once in JRF's script, i thought its easy to find out...
James R. Ferguson
Acclaimed Contributor
Solution

Re: process monitoring script help...

HI (again) Sekar:

> previously i used JRF's:
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] && DOWN=$(echo ${DOWN} ${PROC})

> now i used yours,
[ -z "$(UNIX95=EXTENDED_PS ps -C ${PROC} -opid=)" ] && DOWN="${DOWN} ${PROC}"
and as you said, i am using
for PROC in $(
> but it give again the same error
syntax error at line 13: `$' unexpected

If you split lines you must use the shell continuation character ('\') like:

[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] \
&& DOWN=$(echo ${DOWN} ${PROC})

There can be *no* trailing whitespace after the '\' character although whitespace can precede the continued statement(s).

The difference between my:

UNIX95=

...and Dennis's:

UNIX95=EXTENDED_PS

...is one of personal choice. The UNIX95 behavior is armed by using the l-value 'UNIX95=' *regardless* of what you set as the value (0, 1, EXTENDED_PS, or whatever). The use of the value "EXTENDED_PS" in this context shows that you know that UNI95 (XPG4) behavior of 'ps' differs from the standard behavior when the command is run in the UNI95 (XPG4) environment.

As Steven said, seeing the code I suggested integrated into your script would make debugging easier. Seeing the contents of *your* input file ('process.list') would also be helpful.

Regards!

...JRF...
sekar sundaram
Honored Contributor

Re: process monitoring script help...

interesting learnings JRF...
ok, between, i found one small idea... suggest me this will work...

for PROC in $(do
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ]
$PROC >>./process-tobe-mailed
done
mail -s processes not running sekar@xyz.com <./process-tobe-mailed
James R. Ferguson
Acclaimed Contributor

Re: process monitoring script help...

Hi (again) Sekar:

You wrote:

for PROC in $(do
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ]
$PROC >>./process-tobe-mailed
done
mail -s processes not running sekar@xyz.com <./process-tobe-mailed

...will enable you to collect a list of the processes that are not running, Before the 'for' loop I would truncate the './process-tobe-mailed' file and then, only if it isn't empty (i.e. its size isn't 0) then send it as the body of the mail as you have. Instead of a 'for' loop you could do:

RSLT=./process-to-be-mailed
cat /dev/null > ${RSLT}
while read PROC
do
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] ${PROC} >> ${RSLT}
done < ./process.list
[ -s "${RSLT}" ] && mail -s "processes not running" sekar@xyz.com < ${RSLT}

Regards!

...JRF...