Operating System - OpenVMS
1753290 Members
5380 Online
108792 Solutions
New Discussion юеВ

Re: Analyzing hanging process

 
SOLVED
Go to solution
Kirsten Kn├╝ttel
Frequent Advisor

Analyzing hanging process

Hello,

I need a bit help with analyzing a hanging process. This process is hanging sporadically.
So, I've got a process with a subprocess. The subprocess seems to hang in a mailbox
SDA>show proc/chann

0020 7FF30020 00000000 Busy MBA5953:

The mailbox is empty. So we think, that this process waits for something that the main process should send him. Can I see exactly where the main process hangs at the moment? A show proc/chann didn't show something busy. But he must wait for something and we want to know what he is waiting for.
We have the problem that we know not enough about the product that we use (SAS) to analyze this deeper.
Perhaps I've explained it in a wrong way, but it is really difficult do explain it.

In the attachment I wrote down a show sys of the main and the child process. Perhaps this helps you.

Kind regards,

Kirsten
11 REPLIES 11
Ian Miller.
Honored Contributor

Re: Analyzing hanging process

the EFWM in SDA will show you which event flag the process is waiting for.
SHOW STACK for the processes may show you PC addresses that can be matched to a MAP for the image.
____________________
Purely Personal Opinion
Wim Van den Wyngaert
Honored Contributor

Re: Analyzing hanging process

Also try SDA show proc to check the quotas of both processes.
Wim
Martin P.J. Zinser
Honored Contributor

Re: Analyzing hanging process

Hello Kirsten,

if this is SAS as in SAS Institute software, then what exactly are you doing with mailboxes?

IIRC these transports were deprecated years ago. Also if you do have a reprodcuer do make sure this gets escalated to the HQ in Cary, SAS Germany in Heidelberg does not have the resources to investigate a problem like this.

Greetings, Martin

P.S. Who used to have business cards from SI in a former life ;-)
Brian Reiter
Valued Contributor

Re: Analyzing hanging process

Hi there,

In my experience an outstanding QIO on a mailbox is shown as busy using SHO PROC/CHAN.

Its a pity really that its third party software. My approach to software hangs/tight loops (mutex and processes in COM) is to extract a number of PC values. You can SHO PROC/CONT and from ANAL/SYS use SHO CALL and SHO CALL/NEXT (Look at the return address on the stack).

With a good number of values run the image up under debug (doesn't need to start running or even be a debug image). Then from the debug prompt just

SET RADIX HEX
SET MODU/ALL
EX

Depending on how its been compiled you may get useful information (modules and approximate line numbers) which can be used to escalate the problem.

cheers

Brian
Hein van den Heuvel
Honored Contributor
Solution

Re: Analyzing hanging process


Ideally, but unlikely, the application has an interface to activate the debugger (lib$signal ss$_debug).

SHOW PROC/CONT is a crude, but OK, first step to get an impression on what the process might be doing. Watch the 'state' and PC, the snapshot you posted shows an x80xxxxxx : system space! So in that instance it is nor running user code. But watch it for a while and you'll catch user space adddresses which can be translated back by using "run/debu ", set mode hex, exa/inst . By watching for a while,
with an eye on the IO counters, state and PC your soon get an impression what the process might be doing (or no doing).

Next good tool is ANAL/SYST. As replied before, set proc, show proc/stack and so on.
System address, like you captured, are valies in all contexts, so you can just do for example:

SDA> exa/inst 8014DC54
F11BXQP_NPRO+01C54: LDQ R27,#X0058(R2)

This was done on my box, running 7.1, unlikey to match yours.
To get a more exact picture try:

SDA> read/exec
SDA> exa/inst 8014DC54
MAKE_DIRINDX_C+00360: LDQ R27,#X0058(R2)

There is a series of SDA extentions 'clue' and 'trc' that open up a wolrd of debugging help, but that's too much for here now.

Any chance you can join the OpenVMS bootcamp?
http://h71000.www7.hp.com/symposium/index.html
There is a whole session and tehn soem dedicated to this subject! "Troubleshooting Hung and Looping Processes, course M405".

http://h71000.www7.hp.com/symposium/may_2004/M405.html


See you there?!

Hein.



Willem Grooters
Honored Contributor

Re: Analyzing hanging process

(off topic)

Any chance you can join the OpenVMS bootcamp?

I'm still looking for a sponsor....

Willem
Willem Grooters
OpenVMS Developer & System Manager
Martin P.J. Zinser
Honored Contributor

Re: Analyzing hanging process

Hello Hein,

while generally speaking your analysis is very valuable (as usual ;-), in this particular case I do not think it should be the customers problem to perform this analysis. SAS is a very very complex application and comes with a considerable yearly license fee to cover new releases and support. So all there should be to do really is to get the pertinent information to the right people and let the engeneers in Cary do their work.

Greetings, Martin
Hein van den Heuvel
Honored Contributor

Re: Analyzing hanging process

I fully agree Martin. I fact, that is usually my line! I don't know why I did not stick that one in this time :-).

Actually, I looking back at the topic I do know. The customer was already going in that direction, proven by the screenshots that were attched. I was lining out possible futher steps loosing sight of the fact that this is a major, supported app, not homegrown.

An of course there is nothing like 'being there', and a little pre-work by the customer with active overvations while the problem is happening will allow to bring a much more problem clear case to report.

Instead of 'it's stuck', they can hopefully indicate it is stuck in module xyx calling service abc with filesuch and so open.

It makes the suggestion for that particular tringin session more relevant. It'll be too late for an immediate problem but it would be a god skill in the back pocket should this happen again.

Kirsten, you did call in the SAS troops for support right?!
(hmmm, I know an HP person or two (Susan, Carl) who work on site at SAS in North Carolina. In fact, up untill half a year they were in the same group I am in! :-)

Sorry, I'm rambling, Have a great weekend you all !
(already well under way for half the world :-)

Regards,
Hein.
Kirsten Kn├╝ttel
Frequent Advisor

Re: Analyzing hanging process

Hello,

hopy you had a nice weekend and many thanks for all of your help.

Normally we contact SAS for their help. But the problem at this time ist that the error is not reproducable. The procedure runs every 2 days and normally everything is O.K. But from time to time it hangs. And we don't know the circumstances that makes the procedure hanging. So it is very difficult to make a call for it.
So we wanted to find these circumstances. But I think now that we've got collected enough to call SAS (resp. a collegue has the contacts).

@Martin:
Yes, it is this SAS. And we really don't know what it is doing with these mailboxes. It was the first time that we've seen that SAS uses mailboxes

@Hein:
An exa/inst brings:
EXE_STD$SYNCH_LOOP_C+00084: BLBC R0,#X000050

Perhaps it helps SAS a bit more.
And I really have no chance to join the OpenVMS bootcamp. I have no sponsor for it. I even don't get UNIX training courses although we want to migrate to UNIX (sorry, we must migrate to UNIX).

Kind regards,

Kirsten