Operating System - HP-UX
1848902 Members
6644 Online
104038 Solutions
New Discussion

Re: Killing high CPU Appln. Processes.

 
SOLVED
Go to solution
ben_43
Frequent Advisor

Killing high CPU Appln. Processes.

Team:
We are running 11.0 on Nclass with 6 processors. We are frequently having runaway processes which occupy high CPU. We decided to put a script(using top) which would capture the top processes and monitor it for 15 minutes and if they constantly occupy more than 90% then we would kill it. Now when we test this script(using self created runaways) on a 3CPU N class box, if I start 6 runaway processes it only occupies 50% of each CPU. The problem is my development has Nclass, 6CPUs and testing has Nclass 3 CPUs and production has Nclass 6CPUs. Now the testing team is not able to identify the runaway 'coz it never gets to 90% (6 processes on 3 CPUs). What would be the solution to this. How do I convince them to run this? I know that it not a good idea to automate the killing but we are only killing specific user processes and we have taken care in ther script to do so. How does these 6 runaway processes behave on a 3CPU box vs 6CPU box? Please explain.

Thanks
Ben.
10 REPLIES 10
Vincent Fleming
Honored Contributor

Re: Killing high CPU Appln. Processes.

You could go by total cpu time used by the process (see the ps(1) man page for details - the heading is "TIME")

You can calculate the elapsed time (clock time) by subtracting the starting time of the process (STIME from ps) from the timeofday(). Then compare this to the TIME (execution time). If they are close (TIME > 80% of elapsed time, for example) then kill it.

But, IMHO, I think you should find out why there are runaway processes in the first place.

Good luck!
No matter where you go, there you are.
ben_43
Frequent Advisor

Re: Killing high CPU Appln. Processes.

Thanks Vincent. This is only a temp. solution and we currently feel that top is the best tool to identify the runaway. Is there any other useful approach. I mean how would you identify a runaway process with the ps that you mentioned?

Thanks
Ben.
Martin Johnson
Honored Contributor

Re: Killing high CPU Appln. Processes.

You need to understand what is the normal profile for these processes, then determine how much a deviation from this noraml profile makes the process a potential candidate for termination. This would be a trial and error process to eliminate false positives.

For example, normal processes average < 10% CPU utilization, with an occassional spike to 60%, then a process that averages > 40%, with spikes to 90% would be a candidate for termination. What about a process that uses > 10% but < 20%?

HTH
Marty
ben_43
Frequent Advisor

Re: Killing high CPU Appln. Processes.

Thanks much for the replies. I wish to state that we have done a thorough study of all the appln. processes for the last 6 months and came to this conclusion. It may happen 'coz the users are coming from the PC thru xemulator.We need to do it. Question would be about the best way to identify the runaway.

Thanks
Ben
Vincent Fleming
Honored Contributor

Re: Killing high CPU Appln. Processes.

ps(1) will print out a list of all running processes, and with the correct flags, all sorts of useful information about them, such as the time they were started, the amount of CPU time they've used, etc.

Try running "ps -ef" at a prompt.

Runaway processes will consume large quantities of CPU time, where most apps will consume very little, even if they run 24/7.

As an example, if you see a user process that was started an hour ago, and it's used 45 CPU minutes, then it's probably a runaway (it has used 75% of the CPU) You could probably do something as simple as checking if it has more than 5 minutes of CPU time total... xterms use little CPU resources normally.

I'm guessing that you're using XTERM and displaying it on a PC with X emulation... I've see XTERMs runaway when the display station (the PC) reboots. I never did find a way of stopping that. My solution was to use telnet instead xterm.

Anyone out there ever figure that out how to stop an xterm from running away when the Xserver goes away?
No matter where you go, there you are.
Martin Johnson
Honored Contributor

Re: Killing high CPU Appln. Processes.

I do it using MeasureWare:

symptom PROC_loop type=CPU
rule PROC_CPU_TOTAL_UTIL > 50 prob PROC_CPU_TOTAL_UTIL

alarm PROC_loop > 90 for 15 minutes
type = "CPU"
start
if PROC_loop > 95 then
red alert "Process looping probability=", PROC_loop, "%"
else
yellow alert "Process looping probability=", PROC_loop, "%"
repeat every 15 minutes
if PROC_loop > 98 then
red alert "Process looping probability=", PROC_loop, "%"
else
yellow alert "Process looping probability=", PROC_loop, "%"
end
reset alert "End of process looping alert"


I don't automatically kill the process, I have an SA verify first before killing.

HTH
Marty
Bill Hassell
Honored Contributor

Re: Killing high CPU Appln. Processes.

top is quite painful to parse and extract the needed data. Do this instead:

ps -e -o pcpu,pid,ppid,ruser,args | sort -rn

Then pick the top processes or those that exceed some percentage of total CPU time.

The problem you are seeing is the most common reason for runaway processes in Unix today: exporting Xwindows applications to an unstable display device (ie, the PC). Typical Xwindow programs do not expect the display to disappear nor are they coded to check for a display device after connecting.

If you check, you'll find not only are these orphaned applications consuming useless CPU cycles, the are also dumping a bunch of junk on the LAN. If you can change the application, add a keep-alive test to see if display still exists. If the test times out, you can terminate gracefully. Of course, apps you can't change will always be a problem.


Bill Hassell, sysadmin
ben_43
Frequent Advisor

Re: Killing high CPU Appln. Processes.

Thanks. Great Answers. Not quite understood Bill's answer about the keep alive test. What we do is a good awk script which parses the top output and picks out the top six processes. But this works well on a 6 CPU box but on a 3CPU box the 6 processes get upto a maximum of 50% only. I just wanted to know how i would test this on a 3CPU box with 6 runaway processes.

Thanks
Ben.
Todd Lehr
Frequent Advisor

Re: Killing high CPU Appln. Processes.

Another good way determine if a process is a runway is to also look at see if the process is still attached to a terminal (TTY) column, and also see if the process now has a PPID of 1, when normally it's parent should be another user process.

Neither of these in and of itself will identify a runaway, but they help find suspect processes. Generally a process with a parent id of 1, that normally doesn't have a PPID of 1, with high cpu is a bad thing and most likely a runaway.

Good Luck

Todd
Giri Sekar.
Trusted Contributor
Solution

Re: Killing high CPU Appln. Processes.

As far as top is concerned you would be OK but make sure you are patched for the latest TOP patch. I would agree with Bill for the keep-alive test. What it means is your application has to monitor the remote DISPLAY constantly and should inform the application for an orderly shutdown whenever the connection to the DISPLAY is lost. That way you will never have an orphan process for that Application.

To answer you 3CPU machine issue, that is exactly the way the process scheduling policy works. Please note that the process priorities are assigned in groups and you may nice/renice them to change priorities.

Giri.
"USL" Unix as Second Language