Operating System - HP-UX
1827295 Members
3182 Online
109717 Solutions
New Discussion

Re: How can one determine what hung / frozen process is waiting on

 
John Ben Urban
Occasional Advisor

How can one determine what hung / frozen process is waiting on

Periodically a process or group of processes hang on my machine. It does not happen all the time and we have not figure out how to reproduce the hang or freeze at will. Often when this happens the process will freeze for 5 minutes or a few hours and then un-freeze and terminate. Sometimes however it never does, remains hung and we reboot.

Running tusc we get:
$ tusc -p 3907
tusc: process 3907 ("vi CBRT_ITG.CB_IP01.log CBRT_ITG.CB_IP02.log CBRT_ITG.CB_IP
03.l"): Cannot currently attach to deactivated processes.
tusc: no process to attach to
$ ps -ef|grep vi | grep -v grep
aiuser 3907 3835 0 11:35:25 pts/tg 00:00 vi CBRT_ITG.CB_IP01.log CBRT_
ITG.CB_IP02.log CBRT_ITG.CB_IP03.l
$

Running kill -9 (or any other value) on the pid does not kill the process, you can just run kill -9 over and over again and nothing happens.

Even slapd has gotten into the hung / frozen state. ps -l does not report any WCHAN.

$ tusc -p 1980
tusc: process 1980 ("slapd -f ./slapd-master.conf -p 32200 -d 1"): Cannot curren
tly attach to deactivated processes.
tusc: no process to attach to
$ ps -l -p 1980
F S UID PID PPID C PRI NI ADDR SZ WCHAN TT
Y TIME CMD
0 R 1021 1980 1 0 154 20 50b74c40 12839 - ?
20:20 slapd
$

uname -a report:
HP-UX my_machine B.11.11 U 9000/800 153404696 unlimited-user license

On some version of the OS (sun, compaq starserver) you can could run crash(1M) and get a kernel stack trace to see what the process is waiting on.

HP seems to have no crash(1M) command. There is an q4 command however. Does that work on a running system and if so, how can see what resource my process is waiting on. Or can I cause a system dump/crash to analize the process which is hung?

Has anyone seen this freeze / unfreeze condition or has have experence figuring out what kernel resouce or module a user process is hangging on ? In an HP 11i environment.

thanks. john
14 REPLIES 14
Michael Steele_2
Honored Contributor

Re: How can one determine what hung / frozen process is waiting on

ipcs -mob (* display shared memory *)

ipcrm -mid (* remove if NATTACH = 0 and not = root *)

Got 'lsof'

lsof -p pid

Glance? Glance advisor? Which is provided free on HP CDs.
Support Fatherhood - Stop Family Law
Steven E. Protter
Exalted Contributor

Re: How can one determine what hung / frozen process is waiting on

UNIX95=1
export UNIX95

ps -efH | grep process_name

Pick a name that picks up the proces. The -H will show with indentation, parents and children and might give you a clue to whats going on.

tusc isn't going to work unless there is an active process, it won't gather much data from a sleeper.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Jeff Schussele
Honored Contributor

Re: How can one determine what hung / frozen process is waiting on

Hi John,

The key is the priority of the process when it's hung. Look at the PRI field on that output. It falls into the kernel range (>=128 - 178 <=).
And worst yet it's in the unsignalable portion of the kernel range - hence useless kill -9. The key is that the kernel itself is the *only* thing that can assign that priority. It has to be blocked there - waiting on a resource that obviously doesn't show up. In this case I'd bet on poor coding....but I never say never.

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
John Ben Urban
Occasional Advisor

Re: How can one determine what hung / frozen process is waiting on

Thanks everyone for the variety of help / pointers. However it might be possible that I am out of memory and I am swap bound.

My ipcs -mob is always shows 120 shared memory items. No semaphores or message queues.

When my process appear 'hung' (or frozen), ps -fadel shows many (up to 22 of them) with an "F" (flags) value of "0" and a "S" (state) value of "R" which means they are: Swapped and Running.

sar -q frequenly shows my swpq-sz with a value greater than 80 and %swpocc of 100.

top shows:
Load averages: 0.28, 0.32, 0.36
366 processes: 218 sleeping, 148 running
Cpu states:
CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS
0 0.07 0.4% 0.0% 0.4% 99.2% 0.0% 0.0% 0.0% 0.0%
1 0.76 0.6% 0.0% 13.8% 85.6% 0.0% 0.0% 0.0% 0.0%
2 0.27 0.0% 0.0% 0.4% 99.6% 0.0% 0.0% 0.0% 0.0%
3 0.04 25.6% 0.0% 23.1% 51.3% 0.0% 0.0% 0.0% 0.0%
--- ---- ----- ----- ----- ----- ----- ----- ----- -----
avg 0.28 6.7% 0.0% 9.5% 83.8% 0.0% 0.0% 0.0% 0.0%

Memory: 2517952K (2397856K) real, 17649000K (17116200K) virtual, 26576K free

Does this look right that I might be out of swap space and/or physical memory?
Steven E. Protter
Exalted Contributor

Re: How can one determine what hung / frozen process is waiting on

swapinfo
swapinfo -m

That will let you know right away what your swap situation is.

In general, swap needs to be between 1.5 and 2.0 times physical memory.

crashes are stored here(default)

/var/adm/crash

If configured in this file

/etc/rc.config.d/savecrash

The file is intuitive and well documented.

If you have a crash dump, here is a procedure for analyzing it. Attached.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Mike Stroyan
Honored Contributor

Re: How can one determine what hung / frozen process is waiting on

The process was runnable but deactivated. That looks like the problem may be memory pressure. The kernel may have deactivated several processes to make room. You could look at the output of vmstat to see how system memory is doing.
You might want to look into patch PHKL_28529, which fixes a problem that could cause memory shortages.
John Ben Urban
Occasional Advisor

Re: How can one determine what hung / frozen process is waiting on

vmstat shows:

procs memory page
faults cpu
r b w avm free re at pi po fr de sr in
sy cs us sy id
2 0 0 2148757 187634 95 40 17 0 0 0 26 1046
5153 485 5 4 92

swapinfo -m shows:
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 4096 1090 3006 27% 0 - 1 /dev/vg00/lvol2
reserve - 3006 -3006
memory 2250 883 1367 39%


Does this -3006 looks right?
Michael Steele_2
Honored Contributor

Re: How can one determine what hung / frozen process is waiting on

Starting to take on the symptoms of either an I/O problem, which seems more likely since your hung process clears itself, or system memory fragmentation, which exhibits degrading performance until freeze or reboot. Please attach:

uptime
sar -d 5 5
sar -v 5 5

...and again when hung. (* especially sar -d 5 5 *)
Support Fatherhood - Stop Family Law
Dietmar Konermann
Honored Contributor

Re: How can one determine what hung / frozen process is waiting on

The process is deactivated, just like Mike explained above. This is usually a result of memory pressure... the kernel tries to reduce active virtual memory by excluding processes completely from scheduling.

The key is that such deactivated processes need to reactivate automatically when enough memory is available again. There is currently an open issue with 11.11 and process reactivation. A patch is ready and currently undergoing testing.

You should open a call with your local response center. Point them to our internal reference 4000053459.

Best regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Bill McNAMARA_1
Honored Contributor

Re: How can one determine what hung / frozen process is waiting on

you can kill the process with the signal that will generate a core dump. (think it's 6!).. then debug the core dump with gdb (http://www.hp.com/go/developers), to get a stack trace etc..

Later,
Bill
It works for me (tm)
Bill Hassell
Honored Contributor

Re: How can one determine what hung / frozen process is waiting on

A process cannot be killed if it is waiting on I/O. A kill signal is just that--a signal to the process that is stored in the process table. When the process starts running again (I/O has been completed) then the appropriate action will occur. As you surmised, the issue is very likely due to I/O and could be in the form of a strange serial or LAN communication problem that never completes, or as you've seen, the process has been deactivated and all or a portion of the process has been moved to the swap area. And in the case of waiting on I/O, a rpocess is a good candidate to be deactivated due to inactivity.

The kill signals are listed with kill -l and kill -3 (actually, kill -QUIT or kill -SIGQUIT will terminate a process and create a core dump. This assumes that ulimit allows core dumps to be taken in the current environment (hint: /usr/bin/ulimit -a).


Bill Hassell, sysadmin
Dietmar Konermann
Honored Contributor

Re: How can one determine what hung / frozen process is waiting on

Deactivated processes (ps -el "flags" value is even) are non-signalable by definition. No need to try any kill.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
John Ben Urban
Occasional Advisor

Re: How can one determine what hung / frozen process is waiting on

 
John Ben Urban
Occasional Advisor

Re: How can one determine what hung / frozen process is waiting on

Just an update for those who may stumble into this in the future. To get past this problem we did:
1 - set the kernel tunable value: swapmem_on=0 by doing this, the kernel will not swap to a memory device but instead swap to the swap device. This gives additional memory to kernel/application to run in.
2 - added swap space.