All,
Here is the alarmdef I have for identifying run-away procs in the system.
==============================================
proctime=3600
PROCESS LOOP
{
if (PROC_USER_NAME != "root") then
{
if ((PROC_RUN_TIME > proctime) and (PROC_CPU_TOTAL_TIME_CUM >= (PROC_RUN_TIME * 0.75))) and
((PROC_DISK_PHYS_IO_RATE_CUM < 10) or (PROC_DISK_LOGL_IO_RATE_CUM <
10)) then
{
exec "/usr/local/bin/somescript PROC_PROC_ID"
}
}
} ==============================================
The above definition identifies the run away processes using the following criteria
1) the process owner is not root and
2) the total elapsed time of the process is more than 3600 secs with the total CPU time by the process
exceeding more than 75% of the elapsed time and
3) the process is not effectively doing any physical or logical IO.
So any process sitting idle for more than 1 hr (doing no physical/logical IO) and consuming CPU time will be considered a run-away process.
I would like some feedback/suggestion from you forum folks.
TIA
Sundar.
Learn What to do ,How to do and more importantly When to do ?