Delays in incoming IP traffic using QIO

Matthijs Boelstra · ‎12-18-2006

I'm running OpenVMS 7.3.2 on an AlphaServer DS25.
In my application I experience delays in receiving IP messages (UDP and TCP).
I use QIO as API for communicating with other peers.

I first discovered problems in receiving UDP multicast messages. I saw that my application detected the changes in data far too late.
The multicast messages are send every 500ms, but after a while my application was processing the messages that were sent 40 seconds earlier. After some tests I discovered that these messages were sent on time and also arrived on time at my server. However VMS was queuing messages.
I'm using QIO calls (asynchronous) to receive and send data. And I use the wflor function to detect events for the incoming messages. The server uses approx. 5% CPU time. So in my opinion there no reason for delay. My feeling is that the events for incoming IP traffic are fired too late, because other events (BMQ and Timers) are always on time.

Now I see the same problem happening with incoming TCP messages. A client application sends a livecheck_message every 5 seconds and my application needs to answer every message with a livecheck_response. When I compare the log files of my application with a log of Ethereal is see that sometimes messages arrive up to 3 seconds too late in my application, while processing the message and sending the livecheck_response takes less then 2ms .
So my guess here is that events are also fired too late.

Does anybody know about these problems?
Or do I need to update something?
Any help is appreciated!!

Regards,
Matthijs Boelstra
AtosOrigin (Netherlands)

Richard Whalen · ‎12-19-2006

Does your application set the NODELAY option on the socket? If not, then TCP packets are normally delayed for as much as 1/2 second waiting for more data in the hopes of filling a packet.

Are you using any masks (P4) in the QIO that you do for the read?, or applying any modifiers to the IO$_READVBLK operation? There are some modifiers that will block completion until the buffer is full, which could delay the VMS program in processing the data.

Robert Gezelter · ‎12-19-2006

Matthijs,

I would also suspect a coding error, if the applications in both cases are locally developed.

As far as a TCP bug, I would confirm that this delay behavior does not occur with PING and some other utilities known to behave correctly (e.g., telnet, the SMTP server), possibly even non HP utilities such as C-Kermit (http://www.columbia.edu/kermit )

It is quite possible that you are seeing a problem caused by incorrect management of event flags in a multi-streamed environment. I recommend that my clients use the AST mechanism when building such applications.

I have to run out the door in a minute, but there are several AST related presentations available for download from my www site.

- Bob Gezelter, http://www.rlgsc.com

Wim Van den Wyngaert · ‎12-19-2006

If not a coding error, it could be a "crazy" setup. Could you post tcpip$etc:sysconfigtab.dat ?

Wim

Wim

Matthijs Boelstra · ‎12-19-2006

Thanks for the answers so far.
Below are some calls to the QIO API that I use.

/* start asynchronous read action if connection is present */
if (tcpChannelInfo->channelState == CONNECTED)
{
rcode = sys$qio(channel->readEventFlag, /* Event flag */
tcpChannelInfo->commSocketChannel, /* Channel number */
IO$_READVBLK, /* I/O function */
channel->readIosb, /* I/O status block */
0, 0,
(char *)channel->readBuffer, /* P1 buffer */
TCP_MAX_BUFFER, /* P2 buffer length */
0, 0, 0, 0);

This is used to start an asynchronous read action on a tcp channel.

status = sys$wflor (event[0], maskWait);

This is the call I use to detect events, the maskWait contains 4 bits (0,1,2,3) where:
0 is for TIMER events,
1 is for BMQ events,
2 is for TCP events and
3 is for UDP events.
Note that other events BMQ and TIMER are also used a lot and they never have delays.

I have attached the file sysconfigtab.dat
But I don't see strange settings although I don't have much VMS experience.

Hope this helps!

Robert Gezelter · ‎12-19-2006

Matthijs,

Before we go further, please confirm that you are running a SINGLE TCP connection with the code that you posted.

Using a single event flag for multiple streams WILL cause all manner of unpredictable behaviors.

- Bob Gezelter, http://www.rlgsc.com

Matthijs Boelstra · ‎12-19-2006

I will explain a little bit more about the application.

The complete application consists of 12 processes (executables) communicating with each other using BEA Message Queue (BMQ).
And all the processes are linked against a static library "framework" which implements functionality that is the same for each process. So each process uses BMQ to communicate with one or more other processes and a few processes use also TCP and one process UDP (multicast).

The problems I described are seen with UDP and TCP. For UDP (used by one process) this problem is solved, becuase now I'm using a timer (every 500 ms) and call a berkeley socket function to see if there is data available. If there is I process it, etc.
This works fine, no delays anymore.

But now, after more testing the same problem comes up in the TCP implementation (also one process, at least I have seen problems in one process). This process sets up a listen socket (using QIO) and accepts up to 4 connections (clients, from a different server).
So when all systems are up and running this process must handle four clients, which are all sending livecheck_messages every 5 seconds and these messages have to be acknowledged by a livecheck_response.

Now your question, I use one eventflag for all four connections and the listening socket.
like this:
#define EFN_TCP 2
which is later used to build the waitMask for the wflor call and to use in the async read call.

I get the impression that this is not the way to do it... please advise on expected behavior and better solutions.

Regards,
Matthijs

John Gillings · ‎12-19-2006

Matthijs,

I'd suggest you build yourself a test harness. Just a simple, single threaded sender and receiver. Instead of async, use a synchronous $QIOW in the receiver in a loop, with timing between messages received. If you can reproduce the delays in that environment you've got something that needs to be reported.

I'd be looking very closely at the logic in your program after the WFLOR. For example, what if a TCP or UDP event occurs while you're off processing a TIMER or BMQ event? Could there be anything else clearing your event flag behind your back? Remember that issuing an operation against an event flag (like a $QIO) will implicitly clear the flag. Similarly, if you're manually $CLREF anywhere, could there be timing windows where you're dismissing an event completion before processing it? (in general, you should NOT need to $CLREF, let the request do that for you).

Typically I'd ignore the event flags themselves for determining event completion. Whenever your WFLOR fires, I'd check the IOSBs (or equivalent) for ALL pending events to see if they've completed. Like this:

LOOP
$WFLOR(all possible events)
FOR each possible event DO
IF event completed THEN
process event
reset event
ENDIF
ENDFOR
ENDLOOP

A crucible of informative mistakes

Robert Gezelter · ‎12-19-2006

Matthijs,

Using a single Event Flag for all of the TCP connections will simply not work. When a QIO(W) is done on a channel, the specified event flag will be cleared. If you have multiple channels all referencing the same event flag, it is not surprising that strange things happen. In short, events are being lost.

IMO, the best way to do this is with Asynchronous System Traps. When each IO operation completes, a call to the AST routine is made. Your code sample already appears to have a data structure associated with each channel. The ASTPRM parameter to the QIO call is the address of this structure. This is used by the AST routine to find its structures. (The IOSB already appears to be in this structure, so you are partway there). Using this approach, there is no need for a WFLOR (at least for the TCP connections), and no need for polling using select.

Alternatively, you could use different event flags for each channel, and continue to use the WFLOR construct. The problem is that Event Flags are a relatively scarce commodity, and it is easy to run out. For getting event flag numbers, I highly recommend the use of the LIB$GET_EF system library call. It is easy to accidentally have overlapping use of an event flag, and it is very difficult to track down.

As I noted earlier today, I have given quite a few presentations on these techniques at past technical seminars. My session "Introduction to OpenVMS AST Programming" can be found at http://www.rlgsc.com/cets/2000/435.html

- Bob Gezelter, http://www.rlgsc.com

Matthijs Boelstra · ‎12-19-2006

Thanks again for all the info, it is very helpful to me.

Because I don't have much time to totally redesign the framework (the costumer is already using the system and I need to fix other functional problems), I want to keep the WFLOR intact.
So if I get everything right I need to use different eventflags for each socket/channel.
And a call to LIB$GET_EF will give me the first free eventflag available. Which I then can assign to my channelstructure when I setup a new channel.

Then when I fall through the WFLOR call (event occurred) I can iterate through my channels (which are stored in a linked list) and find the channel with the matching eventflag.

Please give your feedback.

Regards,
Matthijs

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Delays in incoming IP traffic using QIO

Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO