1834788 Members
2863 Online
110070 Solutions
New Discussion

Re: Killing Hung Process

 
brian_31
Super Advisor

Killing Hung Process

Team:

We have a monitoring script (which will run at every 30 Seconds), we will check for a java process.
If the process is present, the script will take the elapsed time for the process from the "ps -ef" output. The logic is if the process's elapsed time is more than specified time, the script will kill the process. We observed that the elapsed time of hanging process is not incrementing as expected. ( Its different than what we observed while testing using the dummy hanging process). So the process is getting killed after long hours( 3 hours
last time instead of 5 minutes).

Could someone Let me know the best method to identify a hanging process and kill it. ps -ef and picking the elapsed time is not working. Any other
ideas?

Thanks in advance,

Best Regards

Brian.
10 REPLIES 10
Dave Olker
Neighborhood Moderator

Re: Killing Hung Process

Hi Brian,

You could use the pstat_getproc() call and examine the pst_start field. This field contains the time the process started displayed in seconds since epoch. You could monitor this field and when it gets above a certain value you could send a signal to the process to terminate it.

The man page for pstat(2) contains several examples of calling pstat_getproc().

Regards,

Dave



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
brian_31
Super Advisor

Re: Killing Hung Process

Hi:

So you mean we are using a shell script now and i can use those commands inside the script? Could you pl. provide an example?

Thanks

Brian.
Dave Olker
Neighborhood Moderator

Re: Killing Hung Process

Hi Brian,

No, these are C programming functions, not K-shell functions. The approach I'm suggesting is to use a C program to call pstat_getproc() and look at one of the fields.

Are you restricted to using shell programming for this task or is C a possibility?

Regards,

Dave


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Suresh Pai
Advisor

Re: Killing Hung Process

ps -efl gives the starting time in STIME (gives the date if more than a day old).
Ralph Grothe
Honored Contributor

Re: Killing Hung Process

Brian,

if you don't want to use the pstat* syscall functions from the libc there is yet another display format of the ps command that you could choose when setting the socalled XPG4 environment through UNIX95.
The etime option displays the elapsed time of a process rather than a date like stime for log running processes.

e.g.

$ UNIX95= ps -e -o comm= -o etime=

grep from the comm field the process you need to monitor and parse the 2nd field.

Somewhere in between C system programming and scripting falls Perl.
If you can reconcile with Perl there is an excellent module on CPAN that gives full access to the pstat* syscalls.
Look here for Proc::ProcessTable

http://www.cpan.org/modules/by-module/Proc/
Madness, thy name is system administration
Andrew Merritt_2
Honored Contributor

Re: Killing Hung Process

Brian,
A couple of observations that haven't been made yet.

The TIME field in the 'ps -ef' output is the cumulative amount of CPU used, not the elapsed time of the process, so it's not a surprise that it's not incremented for a hanging process (one that's not using any CPU).

Using Ralph's suggeestion of using the UNIX95 options looks the simplest way to get what you want, but you'll need to include the PID field too, if you want to be able to kill the process:

UNIX95= ps -e -o comm= -o etime= -o pid=

Writing a C program and looking at the pst_start field would work, but what you'd be doing is monitoring the difference between the value for that process and the current time; the start time itself is not going to change.

Andrew
Bill Hassell
Honored Contributor

Re: Killing Hung Process

To expand on the UNIX95 option in ps, you might want to change comm to args and add the -x option to get the entire command line, something like this:

UNIX95= ps -e -x -o pid,ppid,flags,state,etime,args

NOTE: -x is only functional on a patched system and mandatory for JavaJunk(tm) due to the massively long command lines and pathnames. The flags and state columns can be useful to toss at the programmers to fix the hanging code. You can also tell ps to list all processes with a specific name and never use grep. Change -e to -C name as in:

UNIX95= ps -C java -x -o pid,ppid,flags,state,etime,args


Bill Hassell, sysadmin
Sarjerao
Frequent Advisor

Re: Killing Hung Process

Hi,
I am facing same problem related to ioscan hung. If I rerun the ioscan,it get hung.
root 17667 1 0 May 30 ? 0:00 ioscan -fnC lan
root 17834 1 0 May 30 ? 0:00 ioscan -fn
root 17016 1 0 May 30 ? 0:00 ioscan -fnC lan
root 20464 1 0 May 30 ? 0:00 ioscan -fn
Andrew Merritt_2
Honored Contributor

Re: Killing Hung Process

Sarjerao, I think you have a different problem from that in the base note. With ioscan, it's most likely there's a hardware problem that's causing the process to hang.

When you say 'hang', do you mean the process just doesn't complete, or that you can't kill it?

I think the best thing would be to open a new thread since this is a different topic.

Andrew
Cem Tugrul
Esteemed Contributor

Re: Killing Hung Process

As an addition to Andrew;
what don't you want to kill these process?
i have never seen such a kind of;
root 17667 1 0 May 30 ? 0:00 ioscan -fnC lan
root 17834 1 0 May 30 ? 0:00 ioscan -fn
root 17016 1 0 May 30 ? 0:00 ioscan -fnC lan
root 20464 1 0 May 30 ? 0:00 ioscan -fn
also i would suggest you to check your syslog
&dmseg
because seems something is going wrong on your system :-(

Good Luck,
Our greatest duty in this life is to help others. And please, if you can't