- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Process termination mailbox
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-12-2006 11:00 AM
тАО04-12-2006 11:00 AM
The scenario is like this:
A detached process is created to execute a program. After successful creation of this detached process, the application reads the process termination mailbox asynchronously (QIO with a IO$_READVBLK, no function modifiers).
After issuing the QIO, the application (current process) goes to hibernate. The AST routine defined for the QIO read is supposed to wake up the hibernating process.
We received complaints from some of our customers indicating that the application goes into an indefinite wait state occasionally. I investigated this and found that the AST routine does not seem to fire. I am wondering if the process completion never gets written or if the channel gets deassigned so that the read can never complete. Where can I find more information about the details of the process termination mailbox?
Thanks,
Malleka
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-12-2006 11:22 AM
тАО04-12-2006 11:22 AM
SolutionMake sure the iosb for the asynch read is static or still valid if it allocated on the stack.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-12-2006 11:26 AM
тАО04-12-2006 11:26 AM
Re: Process termination mailbox
you mean indefinite "hib"ernation state?, did you make sure AST not at all invoked?
Archunan
Archie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-12-2006 07:16 PM
тАО04-12-2006 07:16 PM
Re: Process termination mailbox
consider to check the subprocess accounting record to see, if and how long it was active.
The process termination mailbox message is written from the DELETE kernel mode AST in the context of the process being deleted, if PCB$L_TMBU (termination mailbox unit number) is non-zero. It's an asynchronous $QIO with the IO$M_NOW modifier, so it won't wait for the reader to read the message.
If the subprocess has been deleted before the read-QIO AST was set up or if the subprocess dies very early, before even PCB$L_TMBU is set up or if there is an error sending the termination mailbox msg, your main process might get stuck.
When the main process is hung, look at the termination mailbox device (MBAxxx:) and at it's operation count with SDA:
$ ANAL/SYS
SDA> SET PROC/ID=
SDA> SHOW PROC/CHAN
SDA> SHOW DEV MBAx:
An operation count of 0 would indicate, that no msg has been written. An operation count of 2 would indicate, that the msg was written and read.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-13-2006 02:50 AM
тАО04-13-2006 02:50 AM
Re: Process termination mailbox
the operation count (no need for SDA :-)
Do start the read on the termination mailbox before creating the process.
Do you know anything more about the state of the process hibernating waiting for the termination message? Are there any outstanding I/O requests?
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-17-2006 05:58 AM
тАО04-17-2006 05:58 AM
Re: Process termination mailbox
When the AST does fire, I get the condition in IOSB status as 44, and the count and dev_info fields are both 0s.
What does status 44 (SS$_ABORT) mean?
Thanks,
Malleka
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-17-2006 06:29 PM
тАО04-17-2006 06:29 PM
Re: Process termination mailbox
a mailbox read QIO may be terminated with SS$_ABORT (instead of SS$_CANCEL), it the channel to the mailbox is de-assigned.
Did you check what happened to the sub-process ? Did it get created ? Did it terminate with an error ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-17-2006 09:58 PM
тАО04-17-2006 09:58 PM
Re: Process termination mailbox
When does the AST fire? At rundown or do you have any $cancels in the code?
Regards Richard Maher
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-18-2006 05:35 AM
тАО04-18-2006 05:35 AM
Re: Process termination mailbox
Your detached process should keep it's own logging (logfile, for instance) to find out what's the cause.
Another way to find out is checking accounting on the termination of that process. I think it will show the actual final state - which should (IIRC) not be SS$_ABORT
BTW: Good VMS programming practice prescibes the check of IOSB, and not just in case of asynchronous access ;)
Willem
OpenVMS Developer & System Manager
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-18-2006 09:13 AM
тАО04-18-2006 09:13 AM
Re: Process termination mailbox
To amplify what Willem mentioned in his earlier posting.
Proper OpenVMS programming practice requires two checks:
- When doing the SYS$QIO[W] call, the checking of the RETURN code (R0) from the system call
- When IO completion occurs (and not before the completion is indicated by either the AST or the event flag; see my comments about interfaces in my architecture and AST-related speeches available at http://www.rlgsc.com/presentations.html ; to put it simply, until the completion is indicated by the kernel, the contents of the IOSB ARE UNDEFINED).
Although you appear to be getting a completion code in the IOSB, you may actually be seeing pre-existing junk data. The IOSB contents are only valid IFF (If and Only If; as the mathematicians say) you invoked one of the QIO (or QIO-like services) AND got a successful completion code. The Success completion refers to the queueing of the operation, not its ultimate success (which is indicated by the IOSB contents upon signaled completion).
You also haven't mentioned a variety of other factors (e.g., whether this is a multi-processor, the relative process priorities) which could also produce erratic results depending upon system load, among other things.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-18-2006 10:52 PM
тАО04-18-2006 10:52 PM
Re: Process termination mailbox
I thought the IOSB status field was set to 0 (SS$_PENDING) when the IO request was started i.e. when SYS$QIO had returned sucesss.
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-19-2006 01:35 AM
тАО04-19-2006 01:35 AM
Re: Process termination mailbox
Yes, I believe that you are correct about the pending status. However, I am being conservative (I suppose you could ignore event flags AND ASTs and just poll the IOSB, but I would not recommend it).
However, in light of some of the practices that I have seen, it pays to be cautious. I will admit that I do not have access to a source listing where I am (and I do not have my IDSM book handy), but I would suspect that the guarantees about atomic update of the IOSB are only good within the Requesting Process, and (I will defer to somebody who can check the code easily) possibly with some other qualifications (Yes, I have seen some very interesting code over the years).
I can, with certainty, state that when the AST queued and/or Event Flag is set, the IOSB contents are completely valid (and thus, I have never relied upon the pending status check).
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-19-2006 01:56 AM
тАО04-19-2006 01:56 AM
Re: Process termination mailbox
the IOSB is being probed and cleared during QIO processing, before the actual IO operation is even being started.
As you are not supposed to specify the same IOSB for multiple concurrent outstanding operations, atomic updates are irrelevant.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-19-2006 02:38 AM
тАО04-19-2006 02:38 AM
Re: Process termination mailbox
Actually, atomic updates ARE an issue, but not in that way.
I have seen far too many cases of code that presumes that a data structure is atomically updated by a different thread, when in reality there is no such guarantee.
In this case, my potential for mis-aligned data structures, multiprocessors, and other potential situations (the Event Flag and/or AST guarantee that the IOSB is completely valid).
When I taught AST programming, I try to warn people to expect somethings that they might not expect to happen. For example, a common COBOL practice (I said COMMON, not good) is to use character variables as switches (e.g. strings containing "YES" or "NO ") rather than binary integers.
Such use fails to take note of the fact that character string copies are non-atomic on virtually ALL computing architectures (including the three relevant ones for OpenVMS: VAX, Alpha, and IA64). When using such strings for synchronization between AST level code and mainline code, it is possible to encounter string values other than the expected, to wit: "YO ", "NES", and "NOS". Similar behaviors can be seen on multiprocessors with improperly aligned data, and complex data (thus my extremely cautious recommendations for coding practices). When the occur, these problems can be devilish to identify and correct.
Hence, my comments.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-20-2006 09:43 AM
тАО04-20-2006 09:43 AM
Re: Process termination mailbox
Thanks for all your help. I 'm sorry I could not get back to you earlier.
Robert, the process priority and processor environment are not issues. The application environment creates multiple sessions, each of which creates detached processes at the same priority of 4. I used a 4100 Alphastation with no multiprocessor options to reproduce the problem.
While trying to get a reproducer to post to this forum, I noticed something strange in the code, the mailbox device number argument to the $creprc call which is obtained from an earlier call to a $crembx and $getdvi calls (termbox_num in the attached code) used an int for the device number while what is expected is an unsigned short. This is the same sequence in our application. So what was happening was, two mailboxes were created, and an asynchronous read request is issued to the second one. When the $CREPRC is invoked, the mailbox unit number was having some garbage instead of the actual unit number returned from the detach_mbx_create. After I changed it to short, this reproducer as well as my application, consistently provides correct results. In this reproducer, I can only verify that the AST is fired, in the actual application, I looked at the IOSB values when the AST fired, and everything seems to be OK.
The SS$_ABORT condition in the IOSB is a totally different story. Again I apologize for not being responsive. As an alternative to the termination mailbox read, there is a portion of code which gets executed optionally. What it does is, check the newly created detached process every second and as soon as the status of nonexistent process is received, wake up the current process. Also, (not shown in the reproducer), immediately after the $hiber, there are two calls to $DASSIGN to deassign the channels to the above two mailboxes. I think this was causing the CANCEL wherein I was getting the SS$_ABORT.
I do have another question. I am not sure if the $WAKEUP with 0 argument indicating current process is guaranteed to work. I am working on that now, to pass the PID of the current process to the AST. I see some strange code there, which makes me think that it ws originally intended to use the PID in the wakeup call but for some reason abandoned. I don't know if they changed their mind because the 0 option is good or because there were other issues.
Thanks,
Malleka
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-20-2006 09:45 AM
тАО04-20-2006 09:45 AM
Re: Process termination mailbox
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-22-2006 09:51 PM
тАО04-22-2006 09:51 PM
Re: Process termination mailbox
termbox_num is declared as int (longword) and is allocated on the stack. It's not being initialized/cleared before usage, so the initial value becomes whatever was on the stack at that address 80(FP) before (for my test case it's 7AE315B0).
detach_mbx_create declares *mbx_unit as short (word) and will therefore only update the contents of the low-order longword on the stack where termbox_num is supposed to be stored (in my case it's 0x381f = unit number of MBA14367)
After the call to detach_mbx_create, the contents of termbox_num is used as a int (longword) again, so you get the correct value in the low-order word, but the previous contents of the high-order word of that longword at 80(FP) - in my test case it becomes 7AE3381F.
Now this value for termbox_num is being passed to $CREPRC by value and it (the WHOLE longword !) will actually be stored in the PCB$L_TMBU field of the subprocess's PCB - I've verified this ! When terminating the sub-process and trying to write the termination mailbox message to that unit number (0x7AE3381F), this operation will - of course - fail, the code in SYSDELPRC is also using a MOVL.
But you have apparently discovered an OpenVMS bug !!!
$ HELP SYS $CREPRC ARGUMENT clearly states:
...
mbxunt
OpenVMS usage:word_unsigned
type: word (unsigned)
access: read only
mechanism: by value
BUT the code in [SYS]syscreprc handles this parameter as a LONGWORD, thus causing your problem to surface.
The VAX version of SYSCREPRC is 'o.k'.
MOVW MBXUNT(AP),PCB$W_TMBU(R10)
but the Alpha version is 'wrong':
MOVL MBXUNT(AP),PCB$L_TMBU(R10)
or the documentation (HELP) is wrong and it's your fault ;-)
The C protoype in the V8.2 system service reference manual shows this:
C Prototype
int sys$creprc ( ..., unsigned short int mbxunt, ...);
which matches the documented usage of mbxunt as a WORD.
Using $WAKE without a PID should be o.k., if you want to wake yourself.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-22-2006 10:18 PM
тАО04-22-2006 10:18 PM
Re: Process termination mailbox
the second paragraph in my previous reply should read:
detach_mbx_create declares *mbx_unit as short (word) and will therefore only update the contents of the low-order WORD on the stack where termbox_num is supposed to be stored (in my case it's 0x381f = unit number of MBA14367)
Volker.