Operating System - OpenVMS
1751934 Members
4742 Online
108783 Solutions
New Discussion юеВ

Re: Process termination mailbox

 
SOLVED
Go to solution
Malleka Ramachandran
Frequent Advisor

Process termination mailbox

I noticed some apparent synchronization issue in our production code.
The scenario is like this:

A detached process is created to execute a program. After successful creation of this detached process, the application reads the process termination mailbox asynchronously (QIO with a IO$_READVBLK, no function modifiers).
After issuing the QIO, the application (current process) goes to hibernate. The AST routine defined for the QIO read is supposed to wake up the hibernating process.
We received complaints from some of our customers indicating that the application goes into an indefinite wait state occasionally. I investigated this and found that the AST routine does not seem to fire. I am wondering if the process completion never gets written or if the channel gets deassigned so that the read can never complete. Where can I find more information about the details of the process termination mailbox?

Thanks,
Malleka
17 REPLIES 17
David Jones_21
Trusted Contributor
Solution

Re: Process termination mailbox

Make sure you start the qio before you created the process. I beleive process rundown writes to the mailbox with a nowait modifier so you can lose the termination message if you don't have a read pending and the detached process dies quickly.

Make sure the iosb for the asynch read is static or still valid if it allocated on the stack.
I'm looking for marbles all day long.
Arch_Muthiah
Honored Contributor

Re: Process termination mailbox

Malleka,

you mean indefinite "hib"ernation state?, did you make sure AST not at all invoked?


Archunan
Regards
Archie
Volker Halle
Honored Contributor

Re: Process termination mailbox

Malleka,

consider to check the subprocess accounting record to see, if and how long it was active.

The process termination mailbox message is written from the DELETE kernel mode AST in the context of the process being deleted, if PCB$L_TMBU (termination mailbox unit number) is non-zero. It's an asynchronous $QIO with the IO$M_NOW modifier, so it won't wait for the reader to read the message.

If the subprocess has been deleted before the read-QIO AST was set up or if the subprocess dies very early, before even PCB$L_TMBU is set up or if there is an error sending the termination mailbox msg, your main process might get stuck.

When the main process is hung, look at the termination mailbox device (MBAxxx:) and at it's operation count with SDA:

$ ANAL/SYS
SDA> SET PROC/ID=
SDA> SHOW PROC/CHAN
SDA> SHOW DEV MBAx:

An operation count of 0 would indicate, that no msg has been written. An operation count of 2 would indicate, that the msg was written and read.

Volker.
Ian Miller.
Honored Contributor

Re: Process termination mailbox

Note that SHOW DEVICE/FULL MBAxxx will display
the operation count (no need for SDA :-)

Do start the read on the termination mailbox before creating the process.

Do you know anything more about the state of the process hibernating waiting for the termination message? Are there any outstanding I/O requests?
____________________
Purely Personal Opinion
Malleka Ramachandran
Frequent Advisor

Re: Process termination mailbox

Thanks for your prompt responses, they have been very useful in understanding what was going on in the application.

When the AST does fire, I get the condition in IOSB status as 44, and the count and dev_info fields are both 0s.
What does status 44 (SS$_ABORT) mean?

Thanks,
Malleka
Volker Halle
Honored Contributor

Re: Process termination mailbox

Malleka,

a mailbox read QIO may be terminated with SS$_ABORT (instead of SS$_CANCEL), it the channel to the mailbox is de-assigned.

Did you check what happened to the sub-process ? Did it get created ? Did it terminate with an error ?

Volker.
Richard J Maher
Trusted Contributor

Re: Process termination mailbox

Hi Malleka

When does the AST fire? At rundown or do you have any $cancels in the code?

Regards Richard Maher
Willem Grooters
Honored Contributor

Re: Process termination mailbox

One reason I can think of, that the connection is aborted: The detached process runs into a fatal condition that casues the OS to interfere - so writing to a termination mailbox is out of the question. AACVIO (access Violation) might be a reason to abort the program abruptly.
Your detached process should keep it's own logging (logfile, for instance) to find out what's the cause.
Another way to find out is checking accounting on the termination of that process. I think it will show the actual final state - which should (IIRC) not be SS$_ABORT

BTW: Good VMS programming practice prescibes the check of IOSB, and not just in case of asynchronous access ;)

Willem
Willem Grooters
OpenVMS Developer & System Manager
Robert Gezelter
Honored Contributor

Re: Process termination mailbox

Malleka,

To amplify what Willem mentioned in his earlier posting.

Proper OpenVMS programming practice requires two checks:

- When doing the SYS$QIO[W] call, the checking of the RETURN code (R0) from the system call
- When IO completion occurs (and not before the completion is indicated by either the AST or the event flag; see my comments about interfaces in my architecture and AST-related speeches available at http://www.rlgsc.com/presentations.html ; to put it simply, until the completion is indicated by the kernel, the contents of the IOSB ARE UNDEFINED).

Although you appear to be getting a completion code in the IOSB, you may actually be seeing pre-existing junk data. The IOSB contents are only valid IFF (If and Only If; as the mathematicians say) you invoked one of the QIO (or QIO-like services) AND got a successful completion code. The Success completion refers to the queueing of the operation, not its ultimate success (which is indicated by the IOSB contents upon signaled completion).

You also haven't mentioned a variety of other factors (e.g., whether this is a multi-processor, the relative process priorities) which could also produce erratic results depending upon system load, among other things.

- Bob Gezelter, http://www.rlgsc.com