Operating System - HP-UX
1833555 Members
3087 Online
110061 Solutions
New Discussion

Re: Process hangs with 100% SYS

 

Process hangs with 100% SYS

Hi,

I have a problem that is confusing me...

Sometimes (i.e., not always) one of the the processes (program developed by us) that I start hangs, consuming 100% system usage. gdb can't attach to it and truss is refused entry. When two of these hang simultaneously, I can't even log on to the machine (it being a 2 CPU one)...

How can I figure out what is happening?
It's HP-UX 11.23.

Thanks in advance,
28 REPLIES 28
Muthukumar_5
Honored Contributor

Re: Process hangs with 100% SYS

See process usage as,

# vmstat
# UNIX95= ps -ef -o cpu,pcpu,pid,comm

It will show the highest cpu% using process.

# top

will also show the process with cpu% rating. (Not accurate always).

--
Muthu
Easy to suggest when don't know about the problem!
Arunvijai_4
Honored Contributor

Re: Process hangs with 100% SYS

Hello,

A better way to debug the problem with logging enabled in your application and try running it.

Also, did you change any kernel parameters ?

-Arun
"A ship in the harbor is safe, but that is not what ships are built for"
Matti_Kurkela
Honored Contributor

Re: Process hangs with 100% SYS

Sounds like the process might be doing something like an infinite loop, calling some system function repeatedly or with insane parameters that make the system function take ages to complete. Because most of the time is spent inside the system function, you get 100% SYS usage.

You could freeze the process for investigation with "kill -STOP ". A stopped process will not respond to most signals, but you can use tools like lsof to get more information on what the process was doing. You can then unfreeze the process with "kill -CONT ", and terminate it as necessary.

With "kill -ABRT " or "kill -QUIT " you should be able to force the process to exit creating a core dump file. You might then be able to get more information by analyzing the core dump.

Of course you must have enabled the core dump creation (if "ulimit -c" is set to zero, core file creation is disabled) before the process is started.
MK

Re: Process hangs with 100% SYS

Thanks for the answers, but:

Muthukumar: I already know what process is doing this.

Arunvijai: This is during startup of the process, and the source code is a few hundred kloc. It would take awhile to do that, so I would like to find some shortcut first...

Matti Kurkela: The process doesn't respond to _any_ kills, including kill -9, kill -STOP and kill -ABRT. It just continues happily eating 100% system time.

Mind you, this problem only happens sometimes, so it's some kind of timing issue; inserting debug code may change the behavior of the program.

Any other ideas? *looks hopeful*

Re: Process hangs with 100% SYS

I just found out that I can't kill the process because it has pri of 134... in fact, _all_ my processes have a PRI above 128....but what could possibly make that happen?

Appreciate any hint... faulty kernel parameters, maybe?
Victor BERRIDGE
Honored Contributor

Re: Process hangs with 100% SYS

Hi Fredrik,

I dont see what PRI has to do with not being able to do a kill...
Under which UID is the program running?
If root why?
If not root did you try as root to kill -9 PI and PPID at the same time?
Now as root did you use glance/gpm to monitor the systems activity before then during your program execution? you should be able to follow and see when it is consuming resources what it is doing or why it is waiting...

It is difficult to give you some advice since you didnt explain what the program does and so we dont have much clues...


All the best
Victor
Hein van den Heuvel
Honored Contributor

Re: Process hangs with 100% SYS

What is the (virtual) memory situtation.
If you are low-on / out-off free pages, and teh swapper kicks in aggresivelly then you may see behaviour similar to this: lots of sys time, unresponsive, ...

fwiw,
Hein.

Re: Process hangs with 100% SYS

Hi Victor,

I saw a message at http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=939681
saying that kernel prio processes can't be signalled. A message in thread http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=190861
says that PRI 128-153 are unkillable. Attempting to kill it along with its PPID, there's no change.

The process is owned by me, not root. I'm not allowed root access. The process cannot be attached to with a debugger, nor trussed. GlancePlus didn't give much... Not a single system call is made, only four files are open, there are 0% wait state for disk io, streams, semaphores, network or anything else. Disk IO rate is 0. When I attempt to look at the process memory regions in GlancePlus, it hangs totally and can't be killed.

As for what the program does... well, given that is encompasses a few hundred thousand lines of code, that would take some explaining. Thing is, I don't even know in what state of startup it is, so I don't know what's going on either. Lots of file I/O is usually involved at startup, though. All I know is that it's early on; its mirror processes running the same binary use 53 MB, whereas this copy only got to 1.3 MB RAM before it "stuck" at 100% system, and it has stuck for the last six hours.

I'm not sure what you would like to know?
Kent Ostby
Honored Contributor

Re: Process hangs with 100% SYS

Fredrik --

If you have q4 on your system, you might be able to get a stack trace by running:

q4 -p /stand/vmunix /dev/kmem

then at the q4 prompt:

trace processor 0
trace processor 1

One of those will be your q4 process probably, but the other should be your hung process.

Another way would be to get a TOC of your machine when this is happening and then use
the document OZBEKBRC00000611 to pull stack traces of all the processes (the part using analyze.pl and the Analyze command).

http://www1.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=OZBEKBRC00000611
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Victor BERRIDGE
Honored Contributor

Re: Process hangs with 100% SYS

Just a little one...

What is the parent PID ?


Courage!
Victor

Re: Process hangs with 100% SYS

Hi Hein,

These numbers are from gpm memory report:
-----------------------------------------
Phys Mem: 4.0 gb
Sys Mem: 694 mb
Buf Cache: 1.2 gb
User Mem: 1.2 gb
Free Mem: 940 mb

Total VM: 1.6 gb
Active VM: 1.4 gb
------------------------------------------
On the other hand, the CPU report says that it spends an average of over 50% on V (page) faults? How can this add up?

Re: Process hangs with 100% SYS

Hi Victor,
PPID is 3919, and its PPID in turn is 1, which it should be since that process, our monitor so to speak, supposed to run as a daemon.

BR,
Fredrik

Re: Process hangs with 100% SYS

Hi Kent,

I couldn't find q4 on the machine, unfortunately. Besides, it sounds to me like you have to be root to run it? I don't have root access....
Victor BERRIDGE
Honored Contributor

Re: Process hangs with 100% SYS

q4 should be in /usr/contrib/bin, but wait and follow Kents advice for he is a q4 guru...

All the best
Victor

Re: Process hangs with 100% SYS

Hi Victor,

well, in that case, I definitely don't have q4. :)

BR,
Fredrik
Victor BERRIDGE
Honored Contributor

Re: Process hangs with 100% SYS

Are you on an itanium box?
when you type model, what do you get?

Normally on (well PA-risc anyway) it is with the core-os...

All the best
Victor

Re: Process hangs with 100% SYS

Hi Victor,

no, actually, it's not an Itanium. Unless someone has fooled me greatly....

"model" gives: 9000/800/A500-7X

BR,
Fredrik

Victor BERRIDGE
Honored Contributor

Re: Process hangs with 100% SYS

What about :
ant:/usr/contrib/Q4/bin $ ll
total 12394
-r-xr-xr-x 1 bin bin 250792 Nov 5 2003 getasm
-r-xr-xr-x 1 bin bin 460837 Nov 5 2003 kmeminfo
-r-xr-xr-x 1 bin bin 495616 Nov 5 2003 nm.elf
-r-xr-xr-x 1 bin bin 932 Nov 5 2003 nm.q4.sw
-r-xr-xr-x 1 bin bin 32768 Nov 5 2003 nm.som
-r-xr-xr-x 1 bin bin 1546916 Nov 5 2003 perl
-r-xr-xr-x 1 bin bin 61 Nov 5 2003 q4
-r-xr-xr-x 1 bin bin 139615 Nov 5 2003 q4.pxdb
-r-xr-xr-x 1 bin bin 3119736 Nov 5 2003 q4exe
-r-xr-xr-x 1 bin bin 2292 Nov 5 2003 q4pxdb
-r-xr-xr-x 1 bin bin 290816 Nov 5 2003 q4pxdb64
-r-xr-xr-x 1 bin bin 559 Nov 5 2003 set_env
ant:/usr/contrib/Q4/bin $


All the best
Victor

Re: Process hangs with 100% SYS

Hi Victor,

ah, _there_ it was. I now tried running what Kent wrote, but all I got was:

@(#) q4 $Revision: 11.X B.11.23l Wed Jun 23 18:05:11 PDT 2004$ 0
q4: (error) failed to open kmem, errno = d

I guess that's q4 lingo for "DENIED" ?

BR,
Fredrik
Victor BERRIDGE
Honored Contributor

Re: Process hangs with 100% SYS

Its more because of permissions on /dev/kmem

Because I can run Kents command but Im in sys group also...
If you cant get a minimum of privileges I dont see how you can sort yourself out of this...

Start negociating for at least some rights with sudo with your sysadmin

All the best
Victor
V. Nyga
Honored Contributor

Re: Process hangs with 100% SYS

Hi,

do you realy *need* the PPID 1?
Why don't you run it from a dtterm?
So you could implement some 'echo's' and you can minimize the area of the code who's hanging.

Just some thoughts
Volkmar
*** Say 'Thanks' with Kudos ***
Victor BERRIDGE
Honored Contributor

Re: Process hangs with 100% SYS

Hi again,
I agree with Volkmar, why should PPID be 1 ?
Especially if it were not the case (since not started by init...) I also have daemons like boject spawners or many httpd that dont hve ppid of 1...
I believe your process mutes to some sort of zombie but not quite since not defunct...


All the best
Victor
A. Clay Stephenson
Acclaimed Contributor

Re: Process hangs with 100% SYS

A properly daemonized process should have a PPID of 1 -- so ignore all the comments about that "error". Your process is obviously in a very tight loop. I would put some assert calls in the suspicious areas -- which are defined as those which can loop. Putting asserts in your code is good practice because you can leave them in and if you compile with NDEBUG defined (e.g. -DNDEBUG=1 or #define NDEBUG=1 before the assert call then the cpp does not include them. Man assert for details.

I would look carefully for unitialized auto storage class variables. You may have a situation where because the variable is not initialized the behavior becomes random (or more accurately depends upon the contects of the stack).

I have found that turning off all optimization sometimes helps because the optimizer might be producing bad code -- in any event, turning off optimization better ensures that the actual code is what you intended.
If it ain't broke, I can fix that.

Re: Process hangs with 100% SYS

Hi all,

thanks for your replies... weekend intervened, which is why I haven't responded yet.

Mr Stephenson, the things you say are sound but unfortunately I've already done most of them. Turning on debug with NDEBUG flags, turning off optimization and turning off the threaded memory handler.

This particular program has been around in different shapes for over ten years without it ever exhibiting this kind of behavior, and it still doesn't on various versions of Solaris or Itanium. So this "suspicious area" is like...all of the startup phase. Ah well, I'll just insert cerrs here and there, I suppose. :-)

Thanks for all the help, folks.

/Fredrik