Operating System - HP-UX
1827856 Members
1548 Online
109969 Solutions
New Discussion

Re: process monitoring script help...

 
SOLVED
Go to solution
sekar sundaram
Honored Contributor

process monitoring script help...

Hi,

we are already doing process monitoring with a simple script:
for i in `cat process.list`
do
Count=`ps -ef | grep -i $i | grep -v grep | wc -l`
if count equal to zero
mail a b c

but it got two interesting issues:
1. the process list is big one...so if ten processes are down, we get ten mails.
i am thinking that we should get only one mail with all process that are not running.
2. after a process went down, when it comes back, we should get an email saying process is running fine. we may get process down alerts many times but "process running" should be only once -ie, after process comes up from process down.

since its a bit complex scripting issue, i would like ask ur ideas...

thanks many...
Regards,
Sekar
21 REPLIES 21
James R. Ferguson
Acclaimed Contributor

Re: process monitoring script help...

HI:

> i am thinking that we should get only one mail with all process that are not running.

Then as you loop through the list of processes that you want to check, capture the names of the ones that aren't running in a variable of your choice and reference that in a email. If the variable is empty; then there's no mail to send.

Since you appear to be looking for process's by name, don't 'grep' but use the UNIX95 (XPG4) behavior to specifically find a process by name. Hence, for your driving loop you might do:

DOWN=""
for PROC in $(do
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] && DOWN=$(echo ${DOWN} ${PROC})
done
[ ! -z "${DOWN}" ] && mailx -s "${DOWN} processes are not running" sekar@xyz.com < /dev/null

...

> after a process went down, when it comes back, we should get an email saying process is running fine. we may get process down alerts many times but "process running" should be only once -ie, after process comes up from process down.

For each process you will need to record its name and a "down" state indication. When your script runs it needs to send an email if the process *is* running but the previous state was "down". You could record this state information in a file or in memory in your script if your script constantly runs. For example, you could launch your script once with an other loop that looks like:

while true;
do
...
sleep 60
done

Regards!

...JRF...
sekar sundaram
Honored Contributor

Re: process monitoring script help...

thanks James,

i got -syntax error at line 16: `DOWN=$' unexpected
then i used the old `cat filename` method.

now i am getting - syntax error at line 16: `DOWN=$' unexpected

Steven Schweda
Honored Contributor

Re: process monitoring script help...

> i got -syntax error at line 16: [...]

How many people, do you think, know what's on
line 16 of your script?
Dennis Handly
Acclaimed Contributor

Re: process monitoring script help...

>I used the old `cat filename` method.

No need to use that evil cat, the $(< X) form isn't causing the error.

>syntax error at line 16: `DOWN=$' unexpected

Did you split both JRF's lines starting with "[" in two?
This should be one line:
[ -z "$(UNIX95=EXTENDED_PS ps -C ${PROC} -opid=)" ] && DOWN="${DOWN} ${PROC}"
sekar sundaram
Honored Contributor

Re: process monitoring script help...

Thanks Dennis,
previously i used JRF's:
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] && DOWN=$(echo ${DOWN} ${PROC})

now i used yours,
[ -z "$(UNIX95=EXTENDED_PS ps -C ${PROC} -opid=)" ] && DOWN="${DOWN} ${PROC}"
and as you said, i am using
for PROC in $(
but it give again the same error
syntax error at line 13: `$' unexpected
line 13 is this - for PROC in $(
sekar sundaram
Honored Contributor

Re: process monitoring script help...

Steven, since `DOWN=$' comes only once in JRF's script, i thought its easy to find out...
James R. Ferguson
Acclaimed Contributor
Solution

Re: process monitoring script help...

HI (again) Sekar:

> previously i used JRF's:
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] && DOWN=$(echo ${DOWN} ${PROC})

> now i used yours,
[ -z "$(UNIX95=EXTENDED_PS ps -C ${PROC} -opid=)" ] && DOWN="${DOWN} ${PROC}"
and as you said, i am using
for PROC in $(
> but it give again the same error
syntax error at line 13: `$' unexpected

If you split lines you must use the shell continuation character ('\') like:

[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] \
&& DOWN=$(echo ${DOWN} ${PROC})

There can be *no* trailing whitespace after the '\' character although whitespace can precede the continued statement(s).

The difference between my:

UNIX95=

...and Dennis's:

UNIX95=EXTENDED_PS

...is one of personal choice. The UNIX95 behavior is armed by using the l-value 'UNIX95=' *regardless* of what you set as the value (0, 1, EXTENDED_PS, or whatever). The use of the value "EXTENDED_PS" in this context shows that you know that UNI95 (XPG4) behavior of 'ps' differs from the standard behavior when the command is run in the UNI95 (XPG4) environment.

As Steven said, seeing the code I suggested integrated into your script would make debugging easier. Seeing the contents of *your* input file ('process.list') would also be helpful.

Regards!

...JRF...
sekar sundaram
Honored Contributor

Re: process monitoring script help...

interesting learnings JRF...
ok, between, i found one small idea... suggest me this will work...

for PROC in $(do
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ]
$PROC >>./process-tobe-mailed
done
mail -s processes not running sekar@xyz.com <./process-tobe-mailed
James R. Ferguson
Acclaimed Contributor

Re: process monitoring script help...

Hi (again) Sekar:

You wrote:

for PROC in $(do
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ]
$PROC >>./process-tobe-mailed
done
mail -s processes not running sekar@xyz.com <./process-tobe-mailed

...will enable you to collect a list of the processes that are not running, Before the 'for' loop I would truncate the './process-tobe-mailed' file and then, only if it isn't empty (i.e. its size isn't 0) then send it as the body of the mail as you have. Instead of a 'for' loop you could do:

RSLT=./process-to-be-mailed
cat /dev/null > ${RSLT}
while read PROC
do
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] ${PROC} >> ${RSLT}
done < ./process.list
[ -s "${RSLT}" ] && mail -s "processes not running" sekar@xyz.com < ${RSLT}

Regards!

...JRF...
sekar sundaram
Honored Contributor

Re: process monitoring script help...

now the script is:

RSLT=/opt/HPO/HealthCheck/process-to-be-mailed
cat /dev/null > ${RSLT}
while read PROC
do
[ -z "$(UNIX95=ps -C ${PROC} -opid=)" ]
${PROC}>>${RSLT}
done < /opt/HPO/HealthCheck/ovprocess.txt
[ -s "${RSLT}" ] && mail -s "processes not running" ssundaram22@csc.com


ssundaram $ sudo /opt/HPO/HealthCheck/ov_healthcheck.sh
/opt/HPO/HealthCheck/ov_healthcheck.sh: test-process1: not found
/opt/HPO/HealthCheck/ov_healthcheck.sh: testEA: not found
Error: Cannot bind host/port to socket.
ERROR: ovsessionmgr: Cannot initialize port 2389. Is there already a ovsessionmgr running?

^X^C

the process list contains HPOVO processes. i included two test processes to check this issue. it works almost, but looks something missing or extra...
James R. Ferguson
Acclaimed Contributor

Re: process monitoring script help...

Hi (again) Sekar:

Your last posting has four errors: (1) You split the line beginning with '[ -z "$(UNIX95= ps'; (2) you dropped the white space following the 'UNIX95= ps'; (3) you dropped the conditional operator ('&&') that applied to the test; and (4) you dropped the 'echo' of the '${PROC}' value.

The code should look like:

#!/usr/bin/sh
RSLT=/opt/HPO/HealthCheck/process-to-be-mailed
cat /dev/null > ${RSLT}
while read PROC
do
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] && echo ${PROC}>>${RSLT}
done < /opt/HPO/HealthCheck/ovprocess.txt
[ -s "${RSLT}" ] && mailx -s "processes not running" ssundaram22@csc.com


Notice that there is whitespace after the 'UNIX95=' and before the 'ps' with *no* semicolon. This arms the UNIX95 behavior only for the duration of the command line.

Regards!

...JRF...


sekar sundaram
Honored Contributor

Re: process monitoring script help...

thanks for the reply..

now the script is:
shcscp18:/clocal/cschpov/user/t5069ss/process-mon $
#!/usr/bin/sh
RSLT=/clocal/cschpov/user/t5069ss/process-mon/process-to-be-mailed
cat /dev/null > ${RSLT}
while read PROC
do
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] && echo ${PROC}>>${RSLT}
done < /clocal/cschpov/user/t5069ss/process-mon/ovprocess.txt
[ -s "${RSLT}" ] && mailx -s "processes not running" ssundaram22@csc.com

now i ran this script:
shcscp18:/clocal/cschpov/user/t5069ss/process-mon $ sh -x process-check.sh
RSLT=/clocal/cschpov/user/t5069ss/process-mon/process-to-be-mailed
+ cat /dev/null
+ read PROC
+ [ -z $(UNIX95= ps -C test-EA -opid=) ]
+ read PROC
+ [ -z $(UNIX95= ps -C testing -opid=) ]
+ read PROC
+ [ -z $(UNIX95= ps -C opcmsga -opid=) ]
+ read PROC
+ [ -s /clocal/cschpov/user/t5069ss/process-mon/process-to-be-mailed ]
shcscp18:

the two test processes - test-EA and testing cant be running. the process-to-be-mailed file is zero size. mail didnt come..
James R. Ferguson
Acclaimed Contributor

Re: process monitoring script help...

Hi (again) Sekar:

> the two test processes - test-EA and testing cant be running. the process-to-be-mailed file is zero size. mail didnt come..

I think you mean that your email didn't identify *which* processes aren't running.
Change:

[ -s "${RSLT}" ] && mailx -s "processes not running" ssundaram22@csc.com

...To:

[ -s "${RSLT}" ] && mailx -s "processes not running" ssundaram22@csc.com < ${RSLT}

Regards!

...JRF...
sekar sundaram
Honored Contributor

Re: process monitoring script help...

this line not updating the process-to-be-mailed

[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] && echo ${PROC} >> ${RSLT}

i mean, after running the script the file is empty. so the [ -s "${RSLT}" ] fails.
ya, i included the < ${RSLT} at the end of mailx.
James R. Ferguson
Acclaimed Contributor

Re: process monitoring script help...

Hi (again) Sekar:

The script I offered works for me.

> the two test processes - test-EA and testing cant be running.

Upon what do you base this? The code I have offered attempts to match *exactly* the process basename rather than fuzzily matching any part of the command line.

Your trace output ('sh -x ...') looks a bit odd. What version are you running? A test case of mine looks like:

# cat /tmp/INPUT
sekar
cron

# sh -x /tmp/MYSH
+ RSLT=/tmp/MAILING
+ cat /dev/null
+ 1> /tmp/MAILING
+ 0< /tmp/INPUT
+ read PROC
+ ps -C sekar -opid=
+ UNIX95=
+ [ -z ]
+ echo sekar
+ 1>> /tmp/MAILING
+ read PROC
+ ps -C cron -opid=
+ UNIX95=
+ [ -z 1555 ]
+ read PROC
+ [ -s /tmp/MAILING ]
+ echo -s processes not running ssundaram22@csc.com
-s processes not running ssundaram22@csc.com

# cat /tmp/MAILING
sekar

Regards!

...JRF...
sekar sundaram
Honored Contributor

Re: process monitoring script help...

oh, i am thinking that as usual i am with hpux, but this project alone we run with sun solaris 5.10
that could be an issue?
James R. Ferguson
Acclaimed Contributor

Re: process monitoring script help...

Hi (again) Sekar:

> oh, i am thinking that as usual i am with hpux, but this project alone we run with sun solaris 5.10 that could be an issue?

Yes, indeed. I don't have a Sun server but I can tell you that the '-C' option doesn't exist on AIX, for example.

If you must resort to a 'grep' to find if a process is running or not, make the potential match less fuzzy then a simple 'grep' for a name. You can do things like:

# ps -e|grep syslogd$

or:

# ps -e|awk '$4=="syslogd"'

Regards!

...JRF...

Dennis Handly
Acclaimed Contributor

Re: process monitoring script help...

>JRF: cat /dev/null > ${RSLT}

No need for that cat there: > ${RSLT}

>(2) you dropped the white space following the 'UNIX95= ps';

That's why I suggest always using EXTENDED_PS when providing examples. (While it it may not prevent the mistaken insertion of a ";", it should handle the space problem.)
James R. Ferguson
Acclaimed Contributor

Re: process monitoring script help...

HI (again):

> Dennis >JRF: cat /dev/null > ${RSLT}
No need for that cat there: > ${RSLT}

Yes, I agree but I find in this syntax the 'cat /dev/null' adds a bit of code clarity.

> Dennis: That's why I suggest always using EXTENDED_PS when providing examples. (While it it may not prevent the mistaken insertion of a ";", it should handle the space problem.)

I understand your point and almost adopted that in this context since there appeared to be confusion :-) [ You never know, I'm adaptable ;-) ]

Regards!

...JRF...
sekar sundaram
Honored Contributor

Re: process monitoring script help...

ok, lets check this idea little later...
for testing i used this option:
Count=`ps -ef | grep -i $i | grep -v grep | wc -l`

its mailing fine. the second question is what the issue. ie, mailing once when all processes are running fine. i am thinking this idea:
1. processes goes down, it gets updated into the process-to-be-mailed file, it gets mailed.
2. in every loop, it checks for the same list of processes, so, if the file size zero(means, all processes are running fine) then one mail should be sent as "All processes are running fine at `date`"
3. in other loops, it should not send an email even if file size is zero. it looks like we should use a "flag".

please check this idea and help me implement(i am bit good at troubleshooting if there is any script, but not good in writing a new script).
James R. Ferguson
Acclaimed Contributor

Re: process monitoring script help...

Hi (again) Sekar:

> the second question is what the issue. ie, mailing once when all processes are running fine. i am thinking this idea:
1. processes goes down, it gets updated into the process-to-be-mailed file, it gets mailed.
2. in every loop, it checks for the same list of processes, so, if the file size zero(means, all processes are running fine) then one mail should be sent as "All processes are running fine at `date`"
3. in other loops, it should not send an email even if file size is zero. it looks like we should use a "flag".

For this you could use something like:

# cat ./mymonitor
#!/usr/bin/sh
RSLT=/tmp/MAILER
INPF=/tmp/PROCLIST
WAIT=10 #...wakeup interval in seconds...
OK=0
while true
do
cat /dev/null > ${RSLT}
while read PROC
do
[ -z "$(UNIX95= ps -C ${PROC} -opid=)" ] && echo ${PROC} >> ${RSLT}
done < ${INPF}
if [ -s "${RSLT}" ]; then
echo mailx -s "processes not running" ssundaram22@xyz.com < ${RSLT}
OK=0
elif [ "${SENT}" = 0 ]; then
echo mailx -s "all processes ok" ssundarm22@xyz.com < /dev/null
OK=1
fi
sleep ${WAIT}
done
exit

...

This will run infinitely, waking up every 'WAIT' seconds. Kill the script with a Control_C interrupt. You will need to adjust the variable values to your needs. You will also need to remove the 'echo' statement before the 'mailx' command when you are ready to actually send mail. Using the 'echo' is useful in debugging since you can see what's happening as you start and stop various processes in another session.

When everything works the way you want, you could encapsulate this script in a run-control/start-up wrapper to "daemonize" it.

Regards!

...JRF...