Operating System - OpenVMS
1748044 Members
4841 Online
108757 Solutions
New Discussion юеВ

Re: Problem with Temporary Mail Box

 
Not applicable

Problem with Temporary Mail Box

Hi,

One process P1 is creating a temporary mail box using a system function SYS$CREMBX.

Second process P2 is trying to establish a connection with temporary mail box created by P1 using system function SYS$ASSIGN, but SYS$ASSIGN function is sometimes returning SS$_IVDEVNAM.

Can anyone help me why SYS$ASSIGN is sometimes returning IVDEVNAM error.

Note: Its a intermittent problem, means most of the times sys$assign is working properly i.e returning SS$_NORMAL and assigning a channel but sometimes (aprox. 1 out of 10) sys$assign is returning IVDEVNAM.

What I know about IVDEVNAM is that, sys$assign will return this value if device i.e. mailbox is no more existing. Can I know what can cause mailbox to be deleted on its own or what could be the other reason that sys$assign is returning IVDEVNAM.

Looking forward for quick help.

Regards,
ajaydec
26 REPLIES 26
Hein van den Heuvel
Honored Contributor

Re: Problem with Temporary Mail Box


This is 100% certain to be a programming error
Typical causes

- mailbox was temporary and creator is already gone.
- Logical name LNM$TEMPORARY_MAILBOX not alsways correct.
- uninitialized (C) variable / buffer overflow
- hardcoded length restrictions (3 or 4 char mbx name?)
- creator failed to check error (no mode channels, pool, ...)

When you write 1 out of 10 is that
- once per day, 1 out of 10 days.
- several times per minute/hour from one process?
- all at the same time from multiple processes, some of which fail?

Finally, how does p2 know the name of the mailbox to use? hardcoded logical name?

hth,
Hein.
Not applicable

Re: Problem with Temporary Mail Box

Hi Hein,


- mailbox was temporary and creator is already gone.
Creator is not gone, its there

- Logical name LNM$TEMPORARY_MAILBOX not alsways correct.
I have already make sure LNM$TEMPORARY_MAILBOX logical is always correct.

- uninitialized (C) variable / buffer overflow
Sorry, I don't understand what you mean by uninitialized (C) variable can you elaborated it, there is no buffer overflow.

- hardcoded length restrictions (3 or 4 char mbx name?)
There is no hard coded restriction, mbx name can be of any length.

- creator failed to check error (no mode channels, pool, ...)
Creater check for SS$_NORMAL return value from SYS$CREMBX, if return value is other than SS$_NORMAL, then creator will log the error.

how does p2 know the name of the mailbox to use? hardcoded logical name?
P2 came to know the mailbox using a logical, its not hardcoded logical, logical value will depend on the mailbox name created by the process p1.

When you write 1 out of 10 is that
- once per day, 1 out of 10 days.
- several times per minute/hour from one process?
- all at the same time from multiple processes, some of which fail?

I'll try to elaborate the situation once again in detail.
We have a product which start 10 process P1-P10. When we start the product all the 10 process will got started and when we shutdown it, all the 10 process will get shutdown.
So first of all P1 got started after that P2, P3 and so on.
P1 is the main process and it will communicate with every other process P2-P10. So, P2-P10 will create a temporary mailbox ( mailbox name will be something like MBA453, MBA454, MBA455 and so on...)
(don't get confused I am interchanging P1 and P2 from previous)
Now P1 will try to establish a channel with mailbox created by each and every individual process. But sometimes P1 process is not able to establish channel with mailbox created by P2 or P3 or P4... or P10 and is giving IVDEVNAM error. And most of the times P1 is able to establish channel with each and every mailbox.

Also note:
1) This problem occurs only on a multiprocessor system. When I run the product on single-processor system, its not giving the IVDEVNAM error.
2) This problem occurs only during the startup of the product.

Hope I didn't confuse much.

Regards,
-ajaydec
Kris Clippeleyr
Honored Contributor

Re: Problem with Temporary Mail Box

Ajaydec,


We have a product which start 10 process P1-P10. When we start the product all the 10 process will got started and when we shutdown it, all the 10 process will get shutdown.
So first of all P1 got started after that P2, P3 and so on.
P1 is the main process and it will communicate with every other process P2-P10. So, P2-P10 will create a temporary mailbox ( mailbox name will be something like MBA453, MBA454, MBA455 and so on...)
(don't get confused I am interchanging P1 and P2 from previous)
Now P1 will try to establish a channel with mailbox created by each and every individual process. But sometimes P1 process is not able to establish channel with mailbox created by P2 or P3 or P4... or P10 and is giving IVDEVNAM error. And most of the times P1 is able to establish channel with each and every mailbox.


Now there's a possibility of mis-synchronisation.
Are you sure that all processes (P2 thru P10) have created there mailboxes, before process P1 tries to assign channels to them?
On a multi-CPU system (and even on a single CPU system), P1 might already be trying to assign channels before all other processes have had a change to create the mailboxes.

What are you using for synchronisation?

Regards,
Kris (aka Qkcl)
I'm gonna hit the highway like a battering ram on a silver-black phantom bike...
Hein van den Heuvel
Honored Contributor

Re: Problem with Temporary Mail Box

Ok much clearler, and as Q says, clearly broken. Just too bad it seemed to work correctly for a while.

>> Now P1 will try to establish a channel with mailbox created by each and every individual process.

So how does P1 know that its slaves have gotten to the point of creating the mailbox?

>> 1) This problem occurs only on a multiprocessor system.

That makes it 100% proof that the application code is broken. Whenever you there is a difference between single CPU and 1 and when running with changed or default priority, the application is broken. Grarantueed. Every time. 1000nds of cases have proven that.

>> When I run the product on single-processor system, its not giving the IVDEVNAM error.

That's just bad luck.
Do a $SHOW PROC/CONT on P1 and it might break.


>> 2) This problem occurs only during the startup of the product.

Of course, because there is a race condition in the startup as described.

Realizing that the MBX is create most of the time, I would suggest a simple retry mechanism, and not to add an eleborat handshake.

Just loop over all slaves.
If channel assigned, goto next
Else try assign.
If none left, done.
Else wait 10 ms and try again up to 10 times.

todo = SLAVES;
retry = 10
while (1) {
for (i=0; i chan = &slave[i]->chan;
if (!*chan) {
status = assign &chan ...
if (status & 1) {
todo--;
} else {
last_status = status;
last_slave = i;
}
}
}
if (!todo && retry--) break;
wait_a_while();
}
if (todo) {
printf STDERR "Failed to assign all channels"...
:
return
}


Good luck!
Hein.




Hoff
Honored Contributor

Re: Problem with Temporary Mail Box

I'd use locks for process coordination. For instance, the controlling primary process seeks an exclusive lock the "FOO_I_AM_ALIVE" lock resource and all the servers queue an incompatible lock against this lock. If a server process is ever granted the lock, it exits. (The controller is gone, and the servers should go, too.)

I'd look to IP network connections, or maybe to ICC if this is a cluster. Mailboxes are pretty old design, and -- in general -- I'd not tend to write new code that ties you into OpenVMS platform interfaces if there's an existing and standard API that avoids such. This means something like IP. (ICC is most certainly a platform interface, but it lets you operate more easily within a cluster.)

And in general, I'd probably look for and look to use existing process management tools, rather than writing these anew. For instance, even inetd and cluster aliases and RPC calls or other such can deals with various of the aspects here for you. Web servers deal with server processes all the time, too, and there are various ways to use web-based servers.

Yeah, I know, you were probably told to use mailboxes by the lead designer or manager. (Oh, well.) There's a mailbox demo available on the Freeware here:
http://mvb.saic.com/freeware/freewarev80/hoffman_examples/
See the mbxdemo.* files.

Now if you're going to use these platform APIs, you'll want to use the DECw$Term_Port stuff to allow the created server processes to have a DECterm terminal device associated, or you'll want to start these processes with a WSA device. In either case, this because it makes it massively more easy to debug code when you can activate the created image with the debugger around, and walk through it. I'm here inferring that you're not using the debugger now, and this may or may not be a correct inference. Here's a demo:

http://mvb.saic.com/freeware/freewarev80/hoffman_examples/create_decterm.c

Stephen Hoffman
HoffmanLabs LLC
Not applicable

Re: Problem with Temporary Mail Box

One thing I can assure you there is no syncronization error.
Actually P1 will also create a mailbox (say MBA345), and it will do a sys$qiow to this mailbox.

Now when P2 process has created its mailbox (say MBA346) then it will write the coded value in P1 mailbox(MBA345), since P1 is continuosly reading MBA345 as soon P2 writes into it P1 will read the coded value and it will came to know that it has to establish a channel with mailbox (MBA346) created by P2.

Similarly, P1 process will establish channel with rest of the mailbox created by different process.

Regards,
ajadec
Hoff
Honored Contributor

Re: Problem with Temporary Mail Box

There have been latent synchronization errors in code that has been heavily used (by all of us!) and particularly in code that has been running without incident for over twenty years. This in code written that was by some of the best engineers any of us here might know.

That the problem here occurs only on a multiprocessor system makes it almost certain that there one or more synchronization errors latent and lurking here.

Here is a list of some of the more common synchronization bugs that can exist:

http://h71000.www7.hp.com/wizard/wiz_1661.html

SMP and SMT are among the most common triggers for and very commonly expose latent synchronization bugs. SMP and SMT can and do exercise the application synchronization and related logic to a degree that uniprocessors and single-threaded applications can not even approach.

Fire up the debugger and/or integrate some debugging (or both), and go in for a look. And do take a look at the sequencing of the mailboxes here. And yes, the debugger and the introduction of integrated debugging can easily alter the behavior of a latent synchronization bug, or can entirely mask it.

John Gillings
Honored Contributor

Re: Problem with Temporary Mail Box

ajaydec,

Well, I'm confused. I still don't understand how any of these processes know the mailbox name created by other processes. The default for LNM$TEMPORARY_MAILBOX is to place the logical name in LNM$JOB, which means it's only visible to processes in the same job tree. It's not clear in this case how the processes are related. If they're not subprocesses under the same master how are the names exchanged? Has LNM$TEMPORARY_MAILBOX been redefined?

This gets into a chicken and egg situation, you need a mailbox to communicate between processes, BUT you need to communicate the name of the mailbox.

Your symptom suggests the step of communicating the mailbox name is broken. It's not clear if you've taken the very obvious step of simply printing out the exact mailbox name you're attempting to open when $ASSIGN returns the IVDEVNAM? If it's via logical name, how do you know it's been defined? If it's a timing issue, try a short delay and retry the $ASSIGN - note this is NOT a recommended "fix", just a way to confirm diagnosis.

One common model for this type of situation is to have the master process (your P1) create a PERMANENT mailbox, with a system wide, well known logical name. The clients create a temporary mailbox then send a message to the master via the permanent mailbox, including the name of their temporary mailbox in the message. The master then opens a channel to the client mailbox and two way communication is established.
A crucible of informative mistakes
Not applicable

Re: Problem with Temporary Mail Box

John,

One common model for this type of situation is to have the master process (your P1) create a PERMANENT mailbox, with a system wide, well known logical name. The clients create a temporary mailbox then send a message to the master via the permanent mailbox, including the name of their temporary mailbox in the message. The master then opens a channel to the client mailbox and two way communication is established.

We are also doing the same, might be I am not able to explain it properly. The only difference is that we are not sending the name of temporary mailbox in the message, rather we have defined a logical "TEMP_MAILBOX_" which will contain the name of temporary mailbox created by process with process id .

So while calling sys$assign, P1 sends "TEMP_MAILBOX_" as one of its parameter and in return sometimes we are geting IVDEVNAM error.

Regards,
ajaydec