Operating System - HP-UX
1822320 Members
5752 Online
109642 Solutions
New Discussion юеВ

Detecting runaway processes.

 
Jonathan Corbeill
Occasional Advisor

Detecting runaway processes.

Periodically we have runaway processes. These are processes that are hanging out there chewing up valuable resources. Usually a user session that had been cancelled or disconnected. Does anyone have a cron job that will detect runaway processes perhaps by using total CPU time and other information to make the determination then to wrap it into an e-mail to notify a system administrator of the process for cancellation?

Am I clear?
12 REPLIES 12
Cheryl Griffin
Honored Contributor

Re: Detecting runaway processes.

Jonathan,

As far as finding the top processes, use the ps command to find the top 10 processes on the system:
# ps -ef aux |head -10

Best Wishes,
Cheryl
"Downtime is a Crime."
Rick Garland
Honored Contributor

Re: Detecting runaway processes.

Some processes do not show up in the top ten of top. There may be several of the processes taking little CPU resources each, but they add up. I almost think that it is an experience thing. There are so many processes in many different environments that could be troublesome. You could get packages that fall under the OpenView name and they will certainly help. You could get scripts that keep an eye on load factors and they will certainly help. But I think it comes down to, what is in your environment that is running away and making the load go up?
Jonathan Corbeill
Occasional Advisor

Re: Detecting runaway processes.

The runaways are usually a database connection where the user had canceled the session but the process still exists. The process will consume 50 - 80 percent of the system resources. It is quite obvious that the process is a runaway process and detection can be based upon total processor time used and another identifying characteristic. They always appear at the top of the top. The system users report the system as "slow" and we don't know of the problem until several hours, sometimes days after the condition arises. If I could locate or develop a script to look for a process where the cpu time is > :30 and not a normal background process, I could wrap the info into an e-mail for admin notification. This could be a script ran by cron every 15 or 30 minutes.
RikTytgat
Honored Contributor

Re: Detecting runaway processes.

Hi,

You might limit your search to processes that are not owned by root, and that have a PPID of 1.

This includes all user processes that fall under the init process after their original parent processes die.

I,ve known situations myself in which a process (netscsape) runs away after the parent process (usually one of the CDE processes) quits abnormally.

Bye,
Rik
James R. Ferguson
Acclaimed Contributor

Re: Detecting runaway processes.

Jonathan:

This is certainly a case-by-case basis. I have this problem on one server hosting an application to which the client telnets.

One solution to finding (and killing) troublesome orphans is:

# MIN=100;
# PIDS=`ps -el|awk -v MIN=$MIN '$3 > MIN && $5 = 1 && $12 ~/\?/ {print $4}'`
# echo $PIDS

This gives back a list of processes that have been inherited by initd. The MIN value of 100 represents the beginning of uid values which represent 'application' users on my server. You can adjust it to your tastes as long as the value does not circumscribe system/root processes.

Instead of echoing $PIDS you could issue a kill for these processes. I would be very careful about this until you are sure of the results.

...JRF...
Kofi ARTHIABAH
Honored Contributor

Re: Detecting runaway processes.

In order to track run-away processes, you might want to profile your system to determine what "normal" conditions are. Then write a script that checks the process table for abnormal conditions (on some of my systems, I do not expect any process to take more than 90 mins. of CPU time in any day, so I have a script that runs every ten minutes and checks the process table for processes that have breached the threshold and sends an alert to me when it finds one)

EXEMPT="YOUR_EXEMPT_PROCESSES_HERE_SEPERATED_BY_|"
#eg. EXEMPT="statdaemon|inetd|vxfsd"
RUNAWAY=VALUE_YOU_WANT_FOR_YOUR_THRESHOLD
#eg. RUNAWAY=90
...
ps -ef | egrep $EXEMPT | while read pid tty time_hh time_mm command ; do
if [ "$time_hh" -gt $RUNAWAY ] ; then

my_msg="$pid $tty ${time_hh}:$time_mm $commandn"
echo "my_errormsg=\"${my_errormsg}${my_msg}\"" 1>&6
# send an e-mail if you want
# with mailx -s "RUNAWAY PROCESS" someone_who_cares...
fi
done 6>$TEMPVARS



nothing wrong with me that a few lines of code cannot fix!
Cheryl Griffin
Honored Contributor

Re: Detecting runaway processes.

At 11.0, there is also the option of using:
# ps -o pcpu

This will display %CPU for the processes.

It requires UNIX95=XPG4 and at minimum, patch level of PHCO_18446
Patch Description: s700_800 11.00: ps(1) Cumulative patch
"Downtime is a Crime."
Jerry Pinnell
New Member

Re: Detecting runaway processes.

Variation on a theme:

CPUHOG=$1
ps -ef | grep -e $CPUHOG | while read LINE
do
PID=`echo $LINE | awk '{print ($2)}'`
PPID=`echo $LINE | awk '{print ($3)}'`
if [ $PPID = '1' ]
then
# echo $LINE
kill -9 $PID
fi
done
CHRIS_ANORUO
Honored Contributor

Re: Detecting runaway processes.

This is a small script that I use in killing some run away processes:

run_procs='ps -e|grep processname | grep "?" | cut -c2-6'
mail m username $run_procs

You can kill the processes by substituting "mail m" with "kill -9" command.
Be careful not to kill unix processes, idetify the exact process that you want to kill. This script file can be executed through cron.

Cheers!
When We Seek To Discover The Best In Others, We Somehow Bring Out The Best In Ourselves.
Marcelo De Florio
Frequent Advisor

Re: Detecting runaway processes.

If your problem is whit the client connection, you are a check the application server. Another posibility is handle the time out for idle connections whit the parameters : (/etc/rc.config.d/nddconf)
NDD_NAME[0]=tcp_keepalive_interval.

regards
Tim Malnati
Honored Contributor

Re: Detecting runaway processes.

As some others have stated, runaways show up in a variety of ways based on the environment. So, if you could, give us a little more info on what the primary function of the environment is and we may be able to help a lot more. Webserver, database engine (Oracle, Sybase, etc), utility server, what?

Re: Detecting runaway processes.

Hi,
I find this thread very helpful. Firstly I found out that the patch PHCO_18446 wasn't installed on our HP-UX 11.00 development system. This gave raise to some confusing discrepancies between the output of "top" and "ps -o comm,pcpu".
Secondlyly I realized, that there seem to be problems with runaway processes when database connections are cancelled.
Here is my problem: I have a multithreaded server process under HP_UX 11.00 that gets requests from an Oracle 8.1.5 database. The database connection is done with the Orcacle Call Interface, using the shared library libclntsh.sl.
On startup dedicated build threads reads data from the database and complete.
When the build threads are completed, and there are only request threads left, I get an %CPU ranging from 50% to 90%. This seems to be the same problem as Jonathans's, if he is using Oracle.
Another strange thing I don't understand: When I observe the process with
"export UNIX95=XPG4; ps -C satellite -o time,etime" I see ELAPSED increasing regularly, but TIME increasing hardly at all. This is what I would expect for a well-behaved server process, that doesn't have any work to do. But how does that fit with a %CPU of 50% ?.

CU, Bernhard.Schmalhofer@fazi.de
Besser ungefShr richtig als genau falsch.