Operating System - OpenVMS
1830624 Members
2227 Online
110015 Solutions
New Discussion

the curious case of the two waiting sub-processes

 
Ian Miller.
Honored Contributor

the curious case of the two waiting sub-processes

I have a process waiting in RWAST. I think it waiting for its two subprocesses to die. Quotas are fine. No I/O outstanding. Each subprocess is in LEF waiting for event flag 29. The call stack apppears to show them in $PUTMSG. They are running RMU. No busy channels. Various channels assigned to images (rdb, acms etc). Two channels to mailboxes. One channel to NLA0: No outstanding I/O. No ungranted lock requests. I tried reading and writing the mailboxes but no change. The process and subprocess are marked for delete (DELPEN flag set) - from earlier attempts. The database has been closed (RMU/CLOSE/ABORT=FORCEX) but is still open waiting for the processes to die.

Any thoughts?
____________________
Purely Personal Opinion
15 REPLIES 15
Robert Gezelter
Honored Contributor

Re: the curious case of the two waiting sub-processes

Ian,

I would suggest using SDA to identify the exact nature of the RWAST. I would not speculate, I would rather take the time and track down the actual details.

Often, RWAST (and similar conditions in other operating systems) are a secondary symptom. The actual problem often lies elsewhere (first learned this fact of life on OS/360, a LONG LONG time ago).

Luckily, on OpenVMS, you can do an SDA on a live system -- it helps to maintain the uptime record. (smile)

- Bob Gezelter, http://www.rlgsc.com
Volker Halle
Honored Contributor

Re: the curious case of the two waiting sub-processes

Ian,

finding out about the RWAST process should be easy:

http://h18000.www1.hp.com/support/asktima/operating_systems/0094A663-57DAB060-1C0069.html

But it will be much more challenging to find out, why the sub-processes don't disappear...

I would start to try to find out, what the $PUTMSG call is trying to do. Write to which channel ?

Could you post (an a .TXT attachment) the output of SDA> SHOW PROC, SHOW PROC/CHAN, CLUE CALL) of the hanging subprocesses ?

If you need your database back and you can't solve the problem immediately, but still want to get a chance to try to find out about the 'mistery', force a crash.

Volker.
John Gillings
Honored Contributor

Re: the curious case of the two waiting sub-processes

Ian,

Unfortunately the DELPEN says someone has attempted to STOP/ID (DELPRC) the process. This is bad because it usually covers over the tracks of what really happened - all we can tell you about this type RWAST is it's the result of someone issuing a STOP/ID against a process that was stuck!

Most likely the only way out is a reboot :-( The moral is to avoid using STOP/ID as the FIRST step in diagnosing an apparently stuck process. Always have a REALLY good look with SDA first.

We *tried* to get STOP/ID changed from invoking $DELPRC to invoking $FORCEX instead, but various things prevented it. The best we got are the new STOP/IMAGE and STOP/EXIT qualifiers as of V7.3-2. Use STOP/IMAGE/ID in preference.

All you need to do is to train all your operators that STOP/ID has a fair chance of dropping you deeper into trouble than where you started, AND burning any bridges behind you.
A crucible of informative mistakes
Wim Van den Wyngaert
Honored Contributor

Re: the curious case of the two waiting sub-processes

Just whished that a real stop/id finally got implemented. IMHO a reboot should never be used to kill a process.

Wim
Wim
Uwe Zessin
Honored Contributor

Re: the curious case of the two waiting sub-processes

Given how OpenVMS works, this is simply not possible. You cannot easily "remove" a process from the system if he has any resources allocated - doing so would leave the system in an inconsistent state.
.
Mobeen_1
Esteemed Contributor

Re: the curious case of the two waiting sub-processes

Ian,
I have found this always usefull when i don't have access to DSNLink articles

http://www.yrl.co.uk/phil/vms/rwast.html

regards
Mobeen
Ian Miller.
Honored Contributor

Re: the curious case of the two waiting sub-processes

I understand the rwast - its waiting for the subprocesses. I know what DELPEN is also.
Its the subprocesses I'm puzzelled by. I expected to see a busy channel.

Attached is some results of SHOW CALL in SDA

I'm probably going to have to reboot soon (its a test system and the users are revolting :-) but I will get a crash to enjoy later.
____________________
Purely Personal Opinion
Volker Halle
Honored Contributor

Re: the curious case of the two waiting sub-processes

Ian,

so RMS (SYS$PUT) is waiting (SYS$SYNCH) on some operation to complete. From the registers saved on the stack, only the saved R4 (000182DA) looks 'suspicious', it may be a %RMS-E-RSA error status.

The call to $PUT should have a RAB address as the first parameter. Some address on the stack may format as a RAB. SDA> SHOW PROC/RMS=RAB should allow to find out all open record streams...

But this is a long way to go ;-)

Volker.
Ian Miller.
Honored Contributor

Re: the curious case of the two waiting sub-processes

I'm doing a crash now and will follow the suggestion re looking for RAB structures.
The system did have zero free space for a short time which may or may not be relevent.

I still thing it strange that no channels were shown as busy. The process had no log files or database files open.
____________________
Purely Personal Opinion
Jan van den Ende
Honored Contributor

Re: the curious case of the two waiting sub-processes

Ian,


The system did have zero free space for a short time


I assume that to mean: "free diskspace on a relevant disk"?

In that case, which relevant file(s) are on THAT disk?
Could that be where your I/O got messed-up?

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Ian Miller.
Honored Contributor

Re: the curious case of the two waiting sub-processes

I ment zero free space on the system disk for a short time.

Would SHOW PROC/RMS show something
____________________
Purely Personal Opinion
Volker Halle
Honored Contributor

Re: the curious case of the two waiting sub-processes

Ian,

sure SDA> SHOW PROC/RMS shows a lot of information, if the process has files opened by RMS - but don't expect any clues without more detailled analysis.

Trying to find the RAB address for the $PUT seems to be the next logical step in trying to get an idea about the problem. SDA> SHOW PROC/RMS=RAB will show you all open record streams.

Volker.
Ian Miller.
Honored Contributor

Re: the curious case of the two waiting sub-processes

nothing from SDA> SHOW PROC/RMS=RAB so I guess its not opened with RMS
____________________
Purely Personal Opinion
Garry Fruth
Trusted Contributor

Re: the curious case of the two waiting sub-processes

While looking through the dump; you may want to check remaining BYTLM and BIOLM.
Ian Miller.
Honored Contributor

Re: the curious case of the two waiting sub-processes

remaining quotas are fine. The process in RWAST I understand - its waiting for its sub-processes then it will end. Its the subprocesses that puzelled me.
____________________
Purely Personal Opinion