Operating System - HP-UX
1838670 Members
6674 Online
110128 Solutions
New Discussion

Blame the Sysadmin!!! Processes won't die

 
SOLVED
Go to solution
Adam Noble
Super Advisor

Blame the Sysadmin!!! Processes won't die

Hi,

We have a situation whereby our SAP system got itself into a state and was not functioning as it should. They have tried to kill the application but two processes simply would not die. If I do an lsof on the processes there are numerous open files and network connections in established states. Apparently as the processes will not die this is an OS issue. I am reluctant to agree on this being the case and would assume its the application code. Anyone got any thought on this?

Adam
8 REPLIES 8
Patrick Wallek
Honored Contributor

Re: Blame the Sysadmin!!! Processes won't die

If processes will not die, then generally it is because they are in a state in which they are not accepting any signals.

This can happen if the process is waiting on I/O that will not complete (like a bad disk).
John Waller
Esteemed Contributor
Solution

Re: Blame the Sysadmin!!! Processes won't die

Hi Adam,

I presume you have tried to kill the processes with different signals, i.e kill -15 , kill -9 .
Can I guess and say that the parent process ID is 1. As said in the previous comment the reason these processes will not die is that they are normally in a sleep state and until they wake, they will not receive the kill signal. Whether this is an OS issue or an application issue, I find it is impossible to tell, you will always applications people blaming the OS and the sysadmins blaming the application. As I am in the sysadmin camp, I always blame the application, as its the application which is hung and the rest of the OS is working fine.
A. Clay Stephenson
Acclaimed Contributor

Re: Blame the Sysadmin!!! Processes won't die

You have to understand that "kill" is probably the most misnamed system call and command in UNIXdom. It should really be called something like "sendsignal" or "raiseyourhand" because all it really does is deliver a message to the process' state table. When the process actually is running, this table is examined and if any flags are present then the signals are acted upon. However, if the process is waiting for a higher priority event such as an i/o operation to complete, it isn't actually running and the signal though delivered by the system to the process is never acted upon until that higher priority event completes. So imagine that you have a failed disk or other i/o device. The process issues a read() system call and then waits for the read() to complete. Meanwhile, you send kill after kill (including 14 kill -9's) and this stupid process still won't die. Each of your signals has been delived but none have been acted upon because the process isn't running.

The OS is handling signals exactly as advertised so there is nothing to fix but you need to monitor your system for failed
i/o devices. You may be missing a few critical patches as well.
If it ain't broke, I can fix that.
Brian DelPizzo
Frequent Advisor

Re: Blame the Sysadmin!!! Processes won't die

I have sometimes seen processes waiting for network IO. You can check your netstat -an for connections.. possibibly in a CLOSE_WAIT state. These do usually die as asked with a kill or kill -9, but sometimes they do not. You can narrow it down with lsof or crashinfo which are great tools for identifying ports and socket identifiers.

If you think you've found a socket open that belongs to this hung process, there are ways to release that socket which often will free the process from it's state of unresponsiveness.
Bill Hassell
Honored Contributor

Re: Blame the Sysadmin!!! Processes won't die

AS mentioned, the processes that won't stop are waiting on some kernel service that will never complete. The most common problem is with network connections that have no timeout associated with the transactions. This seems to be more common when large scale applications like Oracle and SAP interact with other applications and systems over the network. If you have any control over the code, look at all connections and recode the application to assume failures happen on every task and take appropriate action to properly report the condition and attempt to recover or shutdown gracefully. If you have no control over the code, then you must insure that network connections and remote will never go down.

As Clay said, this is not an OS issue but a design failure, possibly because of an architectural design rather than specific code. For instance, running SAP or a big database over the open Internet is asking for program hangs like this, not to mention that all the data is now at risk.


Bill Hassell, sysadmin
Torsten.
Acclaimed Contributor

Re: Blame the Sysadmin!!! Processes won't die

You may a a status like "wait to be killed" as mentioned or even a zombie process. As you know, you cannot kill a zombie - it's already dead.
Just in case of a "zombie" the only solution is called "reboot".

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
hpuxrox
Respected Contributor

Re: Blame the Sysadmin!!! Processes won't die

Check to see if your process it in an IO wait. if it is in IO wait you will need to resolve the issue with the disk or reboot.
Dennis Handly
Acclaimed Contributor

Re: Blame the Sysadmin!!! Processes won't die

>Clay: you send kill after kill (including 14 kill -9's) and this stupid process still won't die.

I've recently had those several kill -9s finally kill the process. Fortunately there was a clearcase patch for the issue.

>Torsten: Just in case of a "zombie" the only solution is called "reboot".

You kill zombies by teaching their parent a lesson and kill them. Of course you have to judge if this is worse than waiting to reboot.