1835626 Members
3643 Online
110081 Solutions
New Discussion

killing a process

 
Rajeev  Shukla
Honored Contributor

killing a process

Any idea on killing a process which doesn't even get killed by kill -9
Its a c program which is doing some select in database for some reason it needs to be killed. But doesn't appear to happen by kill -9

Any idea about stoping that process without rebooting the server

Thanks
rajeev
16 REPLIES 16
Jeff Schussele
Honored Contributor

Re: killing a process

Hi rajeev,

If -9 doesn't do it, it's probably waiting on I/O. Find the process that's hung the I/O, kill it & the org process will die with it.
If you're lucky, you won't have to shutdown the DB.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Rajeev  Shukla
Honored Contributor

Re: killing a process

Hi Jeff,
The problem is that the parent process is "1" there is no process associated with it. And by looking at the logs of database i can see that it hasn't even started any query yet.

Steven E. Protter
Exalted Contributor

Re: killing a process

If the process has a parent id of 1, you can't kill it with -9 because that would kill the init process, whole box.

killing the parent in this case is bad and illegal in all 50 states. hee hee.

There are other possible codes on the kill command.

kill -25 suspends the process which is almost as good as killing it. A sleeping process can't hurt anybody.

kill -9 is usually the last resort. For me, when that doesn't work its usually reboot time.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Rajeev  Shukla
Honored Contributor

Re: killing a process

Dont tell me, there should be something, somewhere to remove a process from the process table.
By the way all kill signals have been tested and none seems to be working.
Michael Steele_2
Honored Contributor

Re: killing a process

What does:

lsof -p pid

...say?

Support Fatherhood - Stop Family Law
Rajeev  Shukla
Honored Contributor

Re: killing a process

this is the output
# /opt/lsof/bin/lsof -p 4179
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
au_wwwpu 4179 auwww cwd VDIR 255,0x2 2048 4307 /tmp_mnt/home/auwww
au_wwwpu 4179 auwww txt VREG 255,0x2 32768 17587 /tmp_mnt/home -- au_wwwpu
au_wwwpu 4179 auwww mem VREG 64,0x7 40960 109 /usr/lib/libnss_files.1
au_wwwpu 4179 auwww mem VREG 64,0x7 126976 24333 /usr/lib/libxti.2
au_wwwpu 4179 auwww mem VREG 64,0x7 679936 26326 /usr/lib/libnsl.1
au_wwwpu 4179 auwww mem VREG 64,0x7 40960 26890 /usr/lib/libnss_nis.1
au_wwwpu 4179 auwww mem VREG 64,0x2003b 2842624 3161 /mars_informix/lib/tools/lib4gsh730.sl
au_wwwpu 4179 auwww mem VREG 64,0x7 122880 1835 /usr/lib/libnsl_s.2
au_wwwpu 4179 auwww mem VREG 64,0x7 12288 698 /usr/lib/libisamstub.1
au_wwwpu 4179 auwww mem VREG 64,0x7 1241088 684 /usr/lib/libcl.2
au_wwwpu 4179 auwww mem VREG 64,0x7 282624 22679 /usr/lib/libm.2
au_wwwpu 4179 auwww mem VREG 64,0x7 143360 111 /usr/lib/libsec.2
au_wwwpu 4179 auwww mem VREG 64,0x7 335872 8818 /usr/lib/libcur_colr.1
au_wwwpu 4179 auwww mem VREG 64,0x7 24576 21322 /usr/lib/libdld.2
au_wwwpu 4179 auwww mem VREG 64,0x7 1552384 25162 /usr/lib/libc.2
au_wwwpu 4179 auwww mem VREG 64,0x7 159744 21320 /usr/lib/dld.sl
au_wwwpu 4179 auwww mem VREG 64,0x8 532 15947 /var/spool/pwgr/status
au_wwwpu 4179 auwww 0r VCHR 3,0x2 0t0 66 /dev/null
au_wwwpu 4179 auwww 1w VREG 64,0x6 30903 242 /tmp/auwww.cron
au_wwwpu 4179 auwww 2w VREG 64,0x6 30903 242 /tmp/auwww.cron
au_wwwpu 4179 auwww 3u VREG 255,0x2 0 17987 /tmp_mnt/home -- mutex
au_wwwpu 4179 auwww 4u unix 0x4c092800 0t0 /var/spool/sockets/pwgr/client4179
au_wwwpu 4179 auwww 5u inet 0x4f7d8668 0t0 TCP mars:58901->mars:csm_service (CLOSE_WAIT)
#
Michael Steele_2
Honored Contributor

Re: killing a process

The /var/spool pids can probably be killed. The /usr/lib pids should be left alone.
The real killer though and the one that's probably a problem, is likely to be that /tmp_mnt "fella". Is there a NFS or other network association there?

I'm concerned about the dld.sl entries. Those usually aren't good. (* Dynamic Link Loader. *)
Support Fatherhood - Stop Family Law
Rajeev  Shukla
Honored Contributor

Re: killing a process

Ya the home is NFS mounted and thats where the program was run from.
You said to kill the /var/spool pid's but the problem is how? coz all these process have same PID..

Michael Steele_2
Honored Contributor

Re: killing a process

Ooops. Meant remove.
Support Fatherhood - Stop Family Law
Rajeev  Shukla
Honored Contributor

Re: killing a process

Out of all these i can see only one file that can be removed and thats
/var/spool/sockets/pwgr/client4179
but after removing this i have seen it gets created.
Rest all files are application file which cant be removed ( i mean its the program file and its home directory which is NFS mounted)

Michael Steele_2
Honored Contributor

Re: killing a process

OK Lets concentrate on NFS then.

nfsstat -s
(* if readlink calls on same magnitude as lookup calls then symbolic links frenquently traversed - Remove symbolic links *)

nfsstat -rc
(*If timeout and retrans values high and badxid is near zero then packets are dropped - Increase timeout *)

nfsstat -rc
(*timeouts and retrans similar then client RPC requests time out - Increase timeout in /etc/rc.config.d/nfsconf *)

netstat -m
(* IF # of requests for memory denied high then server doesn't have enough memory *)

netstat -s -p udp
(* high socket overflows then need mord nfsd daemons *)

sar -u 5 5
%idle = 0? cpu bottleneck
Support Fatherhood - Stop Family Law
Christian Gebhardt
Honored Contributor

Re: killing a process

Hi

take a look at this line
au_wwwpu 4179 auwww 5u inet 0x4f7d8668 0t0 TCP mars:58901->mars:csm_service (CLOSE_WAIT)

This process has a connection from mars port 58901 to mars port csm_service (you can see the portnumber in /etc/services)

try:
lsof | grep csm_service

maybe you get something like this:
mars:csm_service-->mars:58901

with another PID

Chris
Adam J Markiewicz
Trusted Contributor

Re: killing a process

I don't know if this is your case.

I had a problem with a process hanging on the socket. SIGKILL didn't help. However SIGALRM did.

Good luck

Adam
I do everything perfectly, except from my mistakes
Pete Randall
Outstanding Contributor

Re: killing a process

A process like this with a parent PID of 1 is a true orphaned zombie. The only way to kill a zombie is to kill off the parent. Since you really don't want to kill off PID 1, there's no way to do it. You can't kill it. The only way to get rid of it is reboot.

Pete

Pete
Bill Hassell
Honored Contributor

Re: killing a process

The problem is with networking. You'll need to look at how the program is handling socket connections as well as NFS usage. Assume that all network connections are extremely unreliable and must be explicitly monitored for proper completion. Your program is hung on network activity that will never complete, thus making kill -9 (and all other kill signals fir that matter) appear to be ineffective. The reality is that once the network connection completes the current trasnaction (pass or fail doesn't matter) then your program will disappear. The kill command is actually a process signal transmitter, but it is the program's responsibility to 'see' the signal. Your program can't see anything because it's waiting on I/O. HP-UX lacks an abort-I/O command.

But until that time, it will simply sit there since the kernel has done all it can to tell the program to go away. Datacomm and network connections are the most common sources for processes that hang and can't be killed with kill -9.


Bill Hassell, sysadmin
Michael Steele_2
Honored Contributor

Re: killing a process

Pull your LAN cable.
Support Fatherhood - Stop Family Law