Operating System - OpenVMS
1753432 Members
4598 Online
108793 Solutions
New Discussion юеВ

Re: Process termination mailbox

 
SOLVED
Go to solution
Ian Miller.
Honored Contributor

Re: Process termination mailbox

Bob,
I thought the IOSB status field was set to 0 (SS$_PENDING) when the IO request was started i.e. when SYS$QIO had returned sucesss.

____________________
Purely Personal Opinion
Robert Gezelter
Honored Contributor

Re: Process termination mailbox

Ian,

Yes, I believe that you are correct about the pending status. However, I am being conservative (I suppose you could ignore event flags AND ASTs and just poll the IOSB, but I would not recommend it).

However, in light of some of the practices that I have seen, it pays to be cautious. I will admit that I do not have access to a source listing where I am (and I do not have my IDSM book handy), but I would suspect that the guarantees about atomic update of the IOSB are only good within the Requesting Process, and (I will defer to somebody who can check the code easily) possibly with some other qualifications (Yes, I have seen some very interesting code over the years).

I can, with certainty, state that when the AST queued and/or Event Flag is set, the IOSB contents are completely valid (and thus, I have never relied upon the pending status check).

- Bob Gezelter, http://www.rlgsc.com
Volker Halle
Honored Contributor

Re: Process termination mailbox

Bob, Ian,

the IOSB is being probed and cleared during QIO processing, before the actual IO operation is even being started.

As you are not supposed to specify the same IOSB for multiple concurrent outstanding operations, atomic updates are irrelevant.

Volker.
Robert Gezelter
Honored Contributor

Re: Process termination mailbox

Volker,

Actually, atomic updates ARE an issue, but not in that way.

I have seen far too many cases of code that presumes that a data structure is atomically updated by a different thread, when in reality there is no such guarantee.

In this case, my potential for mis-aligned data structures, multiprocessors, and other potential situations (the Event Flag and/or AST guarantee that the IOSB is completely valid).

When I taught AST programming, I try to warn people to expect somethings that they might not expect to happen. For example, a common COBOL practice (I said COMMON, not good) is to use character variables as switches (e.g. strings containing "YES" or "NO ") rather than binary integers.

Such use fails to take note of the fact that character string copies are non-atomic on virtually ALL computing architectures (including the three relevant ones for OpenVMS: VAX, Alpha, and IA64). When using such strings for synchronization between AST level code and mainline code, it is possible to encounter string values other than the expected, to wit: "YO ", "NES", and "NOS". Similar behaviors can be seen on multiprocessors with improperly aligned data, and complex data (thus my extremely cautious recommendations for coding practices). When the occur, these problems can be devilish to identify and correct.

Hence, my comments.

- Bob Gezelter, http://www.rlgsc.com

Malleka Ramachandran
Frequent Advisor

Re: Process termination mailbox

Hello all,

Thanks for all your help. I 'm sorry I could not get back to you earlier.

Robert, the process priority and processor environment are not issues. The application environment creates multiple sessions, each of which creates detached processes at the same priority of 4. I used a 4100 Alphastation with no multiprocessor options to reproduce the problem.

While trying to get a reproducer to post to this forum, I noticed something strange in the code, the mailbox device number argument to the $creprc call which is obtained from an earlier call to a $crembx and $getdvi calls (termbox_num in the attached code) used an int for the device number while what is expected is an unsigned short. This is the same sequence in our application. So what was happening was, two mailboxes were created, and an asynchronous read request is issued to the second one. When the $CREPRC is invoked, the mailbox unit number was having some garbage instead of the actual unit number returned from the detach_mbx_create. After I changed it to short, this reproducer as well as my application, consistently provides correct results. In this reproducer, I can only verify that the AST is fired, in the actual application, I looked at the IOSB values when the AST fired, and everything seems to be OK.
The SS$_ABORT condition in the IOSB is a totally different story. Again I apologize for not being responsive. As an alternative to the termination mailbox read, there is a portion of code which gets executed optionally. What it does is, check the newly created detached process every second and as soon as the status of nonexistent process is received, wake up the current process. Also, (not shown in the reproducer), immediately after the $hiber, there are two calls to $DASSIGN to deassign the channels to the above two mailboxes. I think this was causing the CANCEL wherein I was getting the SS$_ABORT.
I do have another question. I am not sure if the $WAKEUP with 0 argument indicating current process is guaranteed to work. I am working on that now, to pass the PID of the current process to the AST. I see some strange code there, which makes me think that it ws originally intended to use the PID in the wakeup call but for some reason abandoned. I don't know if they changed their mind because the 0 option is good or because there were other issues.

Thanks,
Malleka


Malleka Ramachandran
Frequent Advisor

Re: Process termination mailbox

I am not sure how to include multiple attachments, the actual c code did not seem to get in my earlier reply, here it is.
Volker Halle
Honored Contributor

Re: Process termination mailbox

Malleka,

termbox_num is declared as int (longword) and is allocated on the stack. It's not being initialized/cleared before usage, so the initial value becomes whatever was on the stack at that address 80(FP) before (for my test case it's 7AE315B0).

detach_mbx_create declares *mbx_unit as short (word) and will therefore only update the contents of the low-order longword on the stack where termbox_num is supposed to be stored (in my case it's 0x381f = unit number of MBA14367)

After the call to detach_mbx_create, the contents of termbox_num is used as a int (longword) again, so you get the correct value in the low-order word, but the previous contents of the high-order word of that longword at 80(FP) - in my test case it becomes 7AE3381F.

Now this value for termbox_num is being passed to $CREPRC by value and it (the WHOLE longword !) will actually be stored in the PCB$L_TMBU field of the subprocess's PCB - I've verified this ! When terminating the sub-process and trying to write the termination mailbox message to that unit number (0x7AE3381F), this operation will - of course - fail, the code in SYSDELPRC is also using a MOVL.

But you have apparently discovered an OpenVMS bug !!!

$ HELP SYS $CREPRC ARGUMENT clearly states:
...
mbxunt

OpenVMS usage:word_unsigned
type: word (unsigned)
access: read only
mechanism: by value

BUT the code in [SYS]syscreprc handles this parameter as a LONGWORD, thus causing your problem to surface.

The VAX version of SYSCREPRC is 'o.k'.

MOVW MBXUNT(AP),PCB$W_TMBU(R10)

but the Alpha version is 'wrong':

MOVL MBXUNT(AP),PCB$L_TMBU(R10)

or the documentation (HELP) is wrong and it's your fault ;-)

The C protoype in the V8.2 system service reference manual shows this:

C Prototype
int sys$creprc ( ..., unsigned short int mbxunt, ...);

which matches the documented usage of mbxunt as a WORD.


Using $WAKE without a PID should be o.k., if you want to wake yourself.

Volker.
Volker Halle
Honored Contributor

Re: Process termination mailbox

Malleka,

the second paragraph in my previous reply should read:

detach_mbx_create declares *mbx_unit as short (word) and will therefore only update the contents of the low-order WORD on the stack where termbox_num is supposed to be stored (in my case it's 0x381f = unit number of MBA14367)

Volker.