- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- multiplying defunct processes
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 03:05 AM
10-28-2005 03:05 AM
multiplying defunct processes
We patched the system (HPUX B.11.00 U 9000/800) last weekend (after several years) and the process we have always used to kill users still logged in is suddenly causing rapidly multiplying defunct processes.
The shell script that runs nightly basically does this:
1. Find all â kshâ processes and kill the ppid and pid (kill â 9 ppid pid)
2. Find any remaining processes locally attached to the Oracle db and kill the ppid and pid.
3. Find any remaining processes non-locally attached to the Oracle db and kill the ppid and pid.
In June, we stopped executing the first step, and only looked for processes actually attached to the db and killed them. No problems. After hitting this problem this week, the first step was reinstated, and then altered to only kill the ppid (not the pid) of the â kshâ process, but that did not help. Here is a summary of what happens:
â Normalâ user session:
root 4095 955 0 08:59:28 pts/tad 0:00 telnetd
upmay 4096 4095 0 08:59:30 pts/tad 0:05 -ksh
upmay 17366 4096 0 12:08:27 pts/tad 0:07 quick subdict=search auto=/uk_home/jervis/v63yoln/MENUGO.qkg
upmay 17382 17366 0 12:08:29 ? 0:00 oracleUK (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
Now the kill script kicks in:
upmay 4096 1 1 08:59:30 ? 0:05 -ksh
upmay 1918 1 0 19:01:44 ? 0:00
upmay 2362 4096 0 19:01:45 ? 0:00 -ksh
And 10 seconds later:
upmay 4096 1 1 08:59:30 ? 0:05 -ksh
upmay 7649 1 0 19:02:06 ? 0:00
And about 20 seconds later:
upmay 4096 1 0 08:59:30 ? 0:05 -ksh
upmay 7651 1 1 19:02:06 ? 0:00
upmay 7649 1 0 19:02:06 ? 0:00
upmay 9016 4096 0 19:02:11 ? 0:00
upmay 9493 1 0 19:02:13 ? 0:00
upmay 9492 1 0 19:02:13 ? 0:00
upmay 11412 1 1 19:02:21 ? 0:00
upmay 11411 1 0 19:02:21 ? 0:00
upmay 13228 1 1 19:02:28 ? 0:00
upmay 16543 1 0 19:02:41 ? 0:00
upmay 13156 1 0 19:02:27 ? 0:00
upmay 13624 1 2 19:02:29 ? 0:00
upmay 13623 1 0 19:02:29 ? 0:00
upmay 17020 1 1 19:02:42 ? 0:00
upmay 15600 1 2 19:02:37 ? 0:00
upmay 17935 1 1 19:02:46 ? 0:00
upmay 15527 1 1 19:02:37 ? 0:00
upmay 16946 1 0 19:02:42 ? 0:00
upmay 18311 1 0 19:02:47 ? 0:00
upmay 18415 1 3 19:02:48 ? 0:00
upmay 16616 1 0 19:02:41 ? 0:00
upmay 20387 1 0 19:02:55 ? 0:00
upmay 20472 1 1 19:02:56 ? 0:00
upmay 23532 1 0 19:03:07 ? 0:00
upmay 23531 1 1 19:03:07 ? 0:00
These processes continue to multiply until the process table is filled. This seems to happen only when the kill script is run at 7:00pm each night. I can kill a user session from the unix prompt in exactly the same manner and cannot seem to reproduce this problem. Any ideas, anyone ????
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 03:30 AM
10-28-2005 03:30 AM
Re: multiplying defunct processes
It will help if you can post the kill script.
-Sundar.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 03:42 AM
10-28-2005 03:42 AM
Re: multiplying defunct processes
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=969878
Pete
Pete
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 03:44 AM
10-28-2005 03:44 AM
Re: multiplying defunct processes
----------------------------------
ps -ef | \
tail +2 | \
grep ksh | \
grep -v "`cat /fh_scripts/isusers.list`" | \
awk '$1 !~ /^c/ {print "kill -9 " $3}' | \
sh
for db in `echo ${dbarray[*]}`
do
echo "Now killing LOCAL users in: $db"
ps -ef | \
tail +2 | \
grep oracle$db | \
grep -v oracle"$db"0899 | \
grep "LOCAL=YES" | \
grep -v "^ oracle" | \
awk '{print "kill -9 " $3," ", $2}' | \
sh
done
for db in `echo ${dbarray[*]}`
do
echo "Now killing NON-LOCAL users in: $db"
ps -ef | \
tail +2 | \
grep oracle$db | \
grep -v oracle"$db"0899 | \
grep "LOCAL=NO" | \
awk '{print "kill -9 " $2}' | \
sh
done
----------------------------
Thanks.....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 03:52 AM
10-28-2005 03:52 AM
Re: multiplying defunct processes
I've closed that one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 04:00 AM
10-28-2005 04:00 AM
Re: multiplying defunct processes
I am not too sure about killing the PPID. if the child ksh is killed, the parent (telnetd or rlogind or sshd) typically exits gracefully.
I am also not too comfortable just greping for ksh. This will kill processes that you dont want to get killed.
Try this
kill -9 $(ps -ef | egrep -i "^-ksh$|^ksh$" | grep -f /fh_scripts/isusers.list | egrep -v "^c" | awk '{print $2}')
Sundar.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 04:04 AM
10-28-2005 04:04 AM
Re: multiplying defunct processes
upmay 4096 1 0 08:59:30 ? 0:05 -ksh
my guess it wakes up and see that a child is not responding, so it starts another child process, and so. on.. like a good daemon might behave.
can you kill JUST the parent daemon first so all it's child die at the same time ? Have you tried this one ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 04:06 AM
10-28-2005 04:06 AM
Re: multiplying defunct processes
kill -9 $(ps -ef | awk '$NF ~ /-ksh/ -o $NF /ksh/ {print}' | grep -f /fh_scripts/isusers.list | egrep -v "^c" | awk '{print $2}')
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 06:04 AM
10-28-2005 06:04 AM
Re: multiplying defunct processes
exactly which "parent daemon" are you saying I should kill? telnetd? I don't think I want to do that. Telnetd is parent to ksh, which is parent to the rest of the session processes, so I've been killing ksh. In this example, the "family" is:
child parent
955 (telnetd) 1 (inetd)
4095 (ksh) 955 (telnetd)
4096 (quick...) 4095 (ksh)
17366 (oracleUK...) 4096 (quick...)
At this point, the first pass is killing just 4095 (ksh). So, is telnetd spawning replacement ksh processes?
Sundar,
Looks like your cmd does the same thing as mind, although more efficiently. I noticed however that you are killing $2 rather than $3. So, you would kill 4096 rather than 4095?
BTW I have run this script on our development system with no problems. That system was patched couple months ago (same patchset). Obviously, I cannot run it on the production sys except once/day at 7:00pm.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 06:20 AM
10-28-2005 06:20 AM
Re: multiplying defunct processes
child parent command
955 (inetd) 1 (init) inetd
4095 (telnetd) 955 (inetd) telnetd
4096 (ksh) 4095 (telnetd) ksh
17366 (quick) 4096 (ksh) quick
17392 (oracleUK) 17633 (quick) oracleUK
I've been killing 4095, the parent of ksh.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 06:43 AM
10-28-2005 06:43 AM
Re: multiplying defunct processes
if you ask me, I would kill PID , instead of PPID.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 06:52 AM
10-28-2005 06:52 AM
Re: multiplying defunct processes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 07:10 AM
10-28-2005 07:10 AM
Re: multiplying defunct processes
Thanks for all the input -- I appreciate the support!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 07:29 AM
10-28-2005 07:29 AM
Re: multiplying defunct processes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 07:34 AM
10-28-2005 07:34 AM
Re: multiplying defunct processes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2005 02:01 AM
10-31-2005 02:01 AM
Re: multiplying defunct processes
Clay, your comments are well-taken. The pea-shooter/canon analagy is a good one.
I have changed the script to kill the PIDâ s rather than the PPIDâ s, and still getting the defunct processes. On Sunday I ran the script again to kill user sessions, and still got the defunct processes multiplying. Once I got them cleaned up so that nobody was logged in except me, I then logged in 4 times, ran the script, and got NO defunct processes. I then logged in 6 times, and then 20 times, ran the script and got NO defunct processes. So I am unable to reproduce this at will. I am logging in as any other user (not a privâ d account), and the processes associated with each login is the same as the scenario I described last week.
From what I can tell, it appears that when I kill the PPID, it is the â kshâ process that is spawning the defunct processes. But when I kill the PID, it is the â quickâ process that is spawning the defunct processes.
Could I be exceeding some threshold?? Any other ideas?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-07-2005 01:18 AM
11-07-2005 01:18 AM
Re: multiplying defunct processes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-07-2005 01:21 AM
11-07-2005 01:21 AM