Operating System - OpenVMS
1752497 Members
5700 Online
108788 Solutions
New Discussion юеВ

Re: System Service sys$qiow

 
Not applicable

System Service sys$qiow

Hi,

I have a process which hang (or its waiting indefinitely) at the system function call SYS$QIOW.

status = SYS$QIOW(0, mbx_chan, IO$_WRITEVBLK, iosb, 0, 0,param->bg_device, strlen(param->bg_device) + 1, 0, 0, 0, 0);

At the time of hang:
Value of mbx_chan is 880
and Value of param->bg_device BG2172

I don't know much about QIOW, what I know is that QIOW will queue the I/O request and then it will wait until the write virtual block is written.

Can any one help me in telling

1) what could be the possible reason of QIOW function waiting indefinitely.
2) What QIOW is writing and where it is writing.
3) Can we have a timeout flag with QIOW system call.

Regards,
Ajaydec
12 REPLIES 12
John Gillings
Honored Contributor

Re: System Service sys$qiow

Ajaydec,

Lots of stuff here. The answers depend on the device you're writing to. I'll assuming it's a mailbox.

If this code has ever worked before, I'd guess that whatever is supposed to be reading the mailbox isn't working properly.

1) Reasons for QIO waiting

First the "expected" reasons for hanging. A mailbox write will wait for the message to be read. If you want it return immediately after placing the message in the mailbox add the modifier IO$M_NOW.

If the mailbox is full, the write will hang in RWMBX state until there's enough space for the message. If you want to avoid that add the modifier IO$M_NORSWAIT (you'll have to check the IOSB for errors and take some sensible action if the mailbox is full).

There are a few unexpected reasons. One is a timing issue - if your EFN (0) is cleared between the time the write completes and returning control to your process you'll hang until something else sets the EFN. It's unlikely for a WRITE, but just in case, use event flag EFN$C_ENF to avoid any possibility of different threads treading on each others event flags.

2) assuming the programmer isn't deliberately trying to obscure things, I'd guess it's writing to a mailbox. It's writing the string pointed to by bg_device.

3) timeout. yes and no. Some drivers have a timeout built in. Just add modifier IO$M_TIMEOUT and set the correct parameter. The mailbox driver doesn't implement timeout, so you have to roll your own. Simplest mechanism is "crossed ASTs". Start by issuing a $SETIMR for an AST that issues a $CANCEL against your mbx_chan, then calls $WAKE. Use your buffer address as the RQIDT. Now call $QIO (now $QIO not $QIOW) specifying an AST that cancels the timer with $CANTIM on your RQIDT, and cals $WAKE. You can now do other stuff, or $HIBER. When the I/O completes, or the timer triggers you'll be woken. Check the IOSB for the $QIO to see what happened. It should be SS$_NORMAL or SS$_CANCEL. Alternatively use a $QIOW followed by the $CANTIM. Similar effect, but less flexible.
A crucible of informative mistakes
Not applicable

Re: System Service sys$qiow

Hi,

The code is working fine for years, but now only its giving error after a load has been increased on the product.

Regards,
Ajaydec
Richard Brodie_1
Honored Contributor

Re: System Service sys$qiow

BG devices are associated with sockets, so fairly similar to a mailbox device, with the added complication of client/server problems.

Something like "TCPIP SHOW DEVICE BG2172/FULL" will show you the socket endpoints.

John Gillings
Honored Contributor

Re: System Service sys$qiow

Ajaydec,

>The code is working fine for years

If I had a dollar for every time I've heard "It's been working for years"... ;-)

One programmers motto is:

"Insunt interdum menda in eo quod est efficax"

which translates to:

"There are sometimes flaws in that which is efficacious."

In other words: "Just because it (seems to) work, doesn't mean it's right".

Increasing the load, upgrading a CPU, changing versions, or any number of other environmental changes can, and in this case probably has, revealed a hitherto unknown bug. When you're dealing with interprocess communications, there can be subtle timing windows which, if you fall through can cause deadlocks.

You need to determine exactly what the device is. I've asumed a mailbox, but you need to look for the $ASSIGN or $CREMBX that writes mbx_chan, find out what's at the other end, and see what it's doing at the time your process is hanging.

$QIO and the mailbox driver are incredibly well exercised code paths in OpenVMS. Think in terms of BILLIONS of operations per day on thousands of systems around the planet. Although they're not necessarily completely bug free, the chances of you finding a bug in $QIO is extremely slim, so your first assumption must be a bug in your code.

re: Richard,
I'm assuming that the BG string is being written to a mailbox because the channel is called mbx_chan. Some other process or thread will read it and act on it.
A crucible of informative mistakes
Willem Grooters
Honored Contributor

Re: System Service sys$qiow

Tne program is trying to write to a socket, but it is not able to finish the action and will wait (qio_W_) until completion. My recollection is that that is the case if the message delivery is acknowleged by the receiving system. If so, it means the acknowledgement hasn't be received.

* Check whether the socket is one way (write_only, in this case) or two-way (read + write). In the latter case, be sure the socket is not in a blocked state due to an unfinished read.
* Does the other side actually read the socket? It might be that the receiver's buffer is full and the message cannot be delivered.
* If the other side had died, your sending socket may still think it's connected. The previous messages will never be read and your sending program's buffer will be full.

To prevent QIO from hanging, what you could do is the following sequence:

* Set a timer for say 2 seconds, to set an event flag on expiration.
* Start QIO - not QIOW - specifying a completion AST and IOSB. The AST routine will set a flag on completion (other than the tiems flag).
* Wait for any of the two eventflags to be set.
* Check the event flag. If the timer flag is set, QIO timed out. If the QIO completion ST, sending the message succeeded.
In all cases, check the IO status block.

next, take appropiate action. (There will surely be away to take a look to the socket status - for instance "BUFFER FULL").
Willem Grooters
OpenVMS Developer & System Manager
Not applicable

Re: System Service sys$qiow

Hi,

Now I came to know that process has written something in mailbox and now it is waiting for other process to read that information from the mailbox. Can I know which process is to to read from mailbox.

Regards,
ajaydec
David Jones_21
Trusted Contributor

Re: System Service sys$qiow

You can't really tell for sure which process is going to the read the mailbox, just which process has read your message when the I/O completes. If you use directional mailboxes (set with flags argument to $CREMBX and/or $ASSGN calls), you can use the IO$M_READERCHECK modifier on your write to make the operation fail immediately if no processes have channel assigned for read. This is useful for programs that expect 'broken pipe' status when process they are sending data to dies unexpectedly.
I'm looking for marbles all day long.
Willem Grooters
Honored Contributor

Re: System Service sys$qiow

Functionally - as far as QIO concerns - it doesn't matter, but all are supposing it's a mailbox. but BG devices are IP sockets.
Do a $ TCPIP SHO DEV and you'll see the BG devices, both the local and remote addresses and ports, and the related services - if any exsist.
You might find a clue. The remote address and port should be fileld and valid. Do the same command on the system that _should_ handle the messages and what's remote on one system should be local on the other, vice versa.

What you need to do is find out what other program is needed to process the data and if it active and properly working. If it has stopped for some reason, it might be needed to restart the sending program in order to re-establish the connection. It might be worthwhile to find out why that program stopped.
If there is no program to pick up the message and there is no method of signalling it went wrong, your program will wait, and wait, and wait... With mailboxes, teh sending program would enter RWMBX state.
Willem Grooters
OpenVMS Developer & System Manager
Wim Van den Wyngaert
Honored Contributor

Re: System Service sys$qiow

If you do a "show dev/fu" you will see the "reference count". If it's 2, both reader and writer are connected. In my environment, the reader and writer are almost always both active.

Wim
Wim