Run-away process

 
Sundar_7
Honored Contributor

Run-away process

All,

Here is the alarmdef I have for identifying run-away procs in the system.

==============================================
proctime=3600

PROCESS LOOP
{
if (PROC_USER_NAME != "root") then
{
if ((PROC_RUN_TIME > proctime) and (PROC_CPU_TOTAL_TIME_CUM >= (PROC_RUN_TIME * 0.75))) and
((PROC_DISK_PHYS_IO_RATE_CUM < 10) or (PROC_DISK_LOGL_IO_RATE_CUM <
10)) then
{
exec "/usr/local/bin/somescript PROC_PROC_ID"

}

}
} ==============================================

The above definition identifies the run away processes using the following criteria

1) the process owner is not root and
2) the total elapsed time of the process is more than 3600 secs with the total CPU time by the process
exceeding more than 75% of the elapsed time and
3) the process is not effectively doing any physical or logical IO.

So any process sitting idle for more than 1 hr (doing no physical/logical IO) and consuming CPU time will be considered a run-away process.

I would like some feedback/suggestion from you forum folks.

TIA

Sundar.
Learn What to do ,How to do and more importantly When to do ?
1 REPLY 1
A. Clay Stephenson
Acclaimed Contributor

Re: Run-away process

This sort of thing is very difficult to define --- and will very much depend upon your environment. For example, a number-crunching application like computational fluid dynamics or finite element analysis would perfectly fit your criteria. These programs can run for days and only do i/o at the beginning and end. Typically, you also need to filter on process names as well.
If it ain't broke, I can fix that.