Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

mailbox issue

 
Gregg Parmentier
Frequent Advisor

mailbox issue


I recently installed VMS 7.2-2 as an upgrade to VMS 7.1 on a AS 4100.

I have a suite of programs that use mailboxes to exchange data. On 7.1 it doesn't matter how much data I atttempt to push through, everything gets passed to the next program through the mailbox eventually. For 7.2-2, when I increase the flow of data, it drops some of the packets.

It has all the earmarks of a flow control problem, but I haven't been able to track it. I'm going back through the release notes for all the versions between 7.1 and 7.2-2 to see if anything changed that might cause this problem, but I haven't found anything yet.

Anyone have any idea how I should proceed?


6 REPLIES 6
Hoff
Honored Contributor

Re: mailbox issue

That looks like a fairly normal race condition in the code, potentially with resource waiting disabled or with a failure to check return status values. Version upgrades are known to expose these sorts of latent bugs, too.

Various applications that don't explicitly set the mailbox buffer size and quota depend on the system parameter, which then tends to lead to platform-specific weirdness. Process quotas for $creprc can contribute the same weirdnesses, too; the system parameters (MBX* and PQL*) can vary from release to release and from box to box, and most folks that depend on the parameters don't check the values.

Here's a classic list of coding bugs that you can look for:
http://h71000.www7.hp.com/wizard/wiz_1661.html

Here's an article on mailboxes and common coding flaws around mailboxes:
http://64.223.189.234/node/250

There was some communications software where I had to implement message dropping, as the alternative (quotas, backpressure) could cause a cascading catastrophic system failure when one node got slow or wedged.

V7.2-2? That's very old. Why not V7.3-2 or V8.3, both of which have more recent software support available.

Gregg Parmentier
Frequent Advisor

Re: mailbox issue


> V7.2-2? That's very old. Why not V7.3-2 or V8.3, both of which have more recent software support available.

7.2-2 is my intermediate step to get to 7.3-2. I'm still running another system on the cluster at 7.1 while I debug the 7.2-2 issues.
John Gillings
Honored Contributor

Re: mailbox issue

Gregg,

Good to see you're only on V7.2-2 as an intermediate!

I'd recommend you don't spend too much time debugging on this version. Upgrade to V7.3 and see what happens.

If you do want to debug, I'd be writing a simple test harness that simply passes a known sequence of messages of varying sizes through a mailbox. This should help convince you that mailboxes really do work as advertised ;-)

When you have the test harness working, compare it with your failing code.

Another option would be to build a very simple program which reads from one mailbox and writes to another (with the option of some ind of logging). You can place this process between your real processes. This gives you a way to track the message flow, and possibly isolate the problem to sender or receiver.

If you have time, build yourself a module that hides the detail of using mailboxes, abstracting it into a generic message passing mechanism to present to the application. You can then test it independently, and possibly replace the transport with alternative mechanisms in the future - for example ICC services to make your application cluster transparent, or a network protocol to distribute it further.
A crucible of informative mistakes
Robert Gezelter
Honored Contributor

Re: mailbox issue

Gregg,

I would concur with John and Hoff. Without looking at the source code, it is not possible to comment precisely, but that type of problem is often some piece of code failing to check or incorrectly processing status returns.

- Bob Gezelter, http://www.rlgsc.com
Ian Miller.
Honored Contributor

Re: mailbox issue

I agree with the others that you should move on to V7.3-2 then debug the problem.

It's probably a latent bug exposed by the upgrade. Perhaps the mailbox fills and the writer ignores the error.

When you are debugging the problem you may find my MBU and MBMON utilities useful.
http://www.encompasserve.org/~miller/
____________________
Purely Personal Opinion
Richard W Hunt
Valued Contributor

Re: mailbox issue

Seems to me that a combination of error handling and one of the PQL SYSGEN parameters could be conspiring against you, depending on exactly what is happening to the status after the MBX operations.

Since you recently upgraded, it is entirely possible that you did an AUTOGEN, which might just have to kvetch a bit about some of your parameters if you didn't keep them updated in your SYS$SYSTEM:MOD_PARAMS.DAT file to maintain that continuity from one system to the next.

The amount of data you can put in a mailbox is controlled by parameters when you create the box, but they are also limited by (I think) the JTQUOTA parameter from your UAF entry and the SYSGEN PQL_DJTQUOTA and PQL_MJTQUOTA. If you hit a limit and your error trapping isn't so good, you silently drop packets when the lowest of those three quotas is reached.

Further, if the errors occur on the sender side, the receiver won't EVER see the errors; The only things it sees are packets that make it into the mailbox and through to the receiver side.
Sr. Systems Janitor