Delays in incoming IP traffic using QIO

Matthijs Boelstra · ‎12-18-2006

I'm running OpenVMS 7.3.2 on an AlphaServer DS25.
In my application I experience delays in receiving IP messages (UDP and TCP).
I use QIO as API for communicating with other peers.

I first discovered problems in receiving UDP multicast messages. I saw that my application detected the changes in data far too late.
The multicast messages are send every 500ms, but after a while my application was processing the messages that were sent 40 seconds earlier. After some tests I discovered that these messages were sent on time and also arrived on time at my server. However VMS was queuing messages.
I'm using QIO calls (asynchronous) to receive and send data. And I use the wflor function to detect events for the incoming messages. The server uses approx. 5% CPU time. So in my opinion there no reason for delay. My feeling is that the events for incoming IP traffic are fired too late, because other events (BMQ and Timers) are always on time.

Now I see the same problem happening with incoming TCP messages. A client application sends a livecheck_message every 5 seconds and my application needs to answer every message with a livecheck_response. When I compare the log files of my application with a log of Ethereal is see that sometimes messages arrive up to 3 seconds too late in my application, while processing the message and sending the livecheck_response takes less then 2ms .
So my guess here is that events are also fired too late.

Does anybody know about these problems?
Or do I need to update something?
Any help is appreciated!!

Regards,
Matthijs Boelstra
AtosOrigin (Netherlands)

Richard Whalen · ‎12-19-2006

Does your application set the NODELAY option on the socket? If not, then TCP packets are normally delayed for as much as 1/2 second waiting for more data in the hopes of filling a packet.

Are you using any masks (P4) in the QIO that you do for the read?, or applying any modifiers to the IO$_READVBLK operation? There are some modifiers that will block completion until the buffer is full, which could delay the VMS program in processing the data.

Robert Gezelter · ‎12-19-2006

Matthijs,

I would also suspect a coding error, if the applications in both cases are locally developed.

As far as a TCP bug, I would confirm that this delay behavior does not occur with PING and some other utilities known to behave correctly (e.g., telnet, the SMTP server), possibly even non HP utilities such as C-Kermit (http://www.columbia.edu/kermit )

It is quite possible that you are seeing a problem caused by incorrect management of event flags in a multi-streamed environment. I recommend that my clients use the AST mechanism when building such applications.

I have to run out the door in a minute, but there are several AST related presentations available for download from my www site.

- Bob Gezelter, http://www.rlgsc.com

Wim Van den Wyngaert · ‎12-19-2006

If not a coding error, it could be a "crazy" setup. Could you post tcpip$etc:sysconfigtab.dat ?

Wim

Wim

Matthijs Boelstra · ‎12-19-2006

Thanks for the answers so far.
Below are some calls to the QIO API that I use.

/* start asynchronous read action if connection is present */
if (tcpChannelInfo->channelState == CONNECTED)
{
rcode = sys$qio(channel->readEventFlag, /* Event flag */
tcpChannelInfo->commSocketChannel, /* Channel number */
IO$_READVBLK, /* I/O function */
channel->readIosb, /* I/O status block */
0, 0,
(char *)channel->readBuffer, /* P1 buffer */
TCP_MAX_BUFFER, /* P2 buffer length */
0, 0, 0, 0);

This is used to start an asynchronous read action on a tcp channel.

status = sys$wflor (event[0], maskWait);

This is the call I use to detect events, the maskWait contains 4 bits (0,1,2,3) where:
0 is for TIMER events,
1 is for BMQ events,
2 is for TCP events and
3 is for UDP events.
Note that other events BMQ and TIMER are also used a lot and they never have delays.

I have attached the file sysconfigtab.dat
But I don't see strange settings although I don't have much VMS experience.

Hope this helps!

Robert Gezelter · ‎12-19-2006

Matthijs,

Before we go further, please confirm that you are running a SINGLE TCP connection with the code that you posted.

Using a single event flag for multiple streams WILL cause all manner of unpredictable behaviors.

- Bob Gezelter, http://www.rlgsc.com

Matthijs Boelstra · ‎12-19-2006

I will explain a little bit more about the application.

The complete application consists of 12 processes (executables) communicating with each other using BEA Message Queue (BMQ).
And all the processes are linked against a static library "framework" which implements functionality that is the same for each process. So each process uses BMQ to communicate with one or more other processes and a few processes use also TCP and one process UDP (multicast).

The problems I described are seen with UDP and TCP. For UDP (used by one process) this problem is solved, becuase now I'm using a timer (every 500 ms) and call a berkeley socket function to see if there is data available. If there is I process it, etc.
This works fine, no delays anymore.

But now, after more testing the same problem comes up in the TCP implementation (also one process, at least I have seen problems in one process). This process sets up a listen socket (using QIO) and accepts up to 4 connections (clients, from a different server).
So when all systems are up and running this process must handle four clients, which are all sending livecheck_messages every 5 seconds and these messages have to be acknowledged by a livecheck_response.

Now your question, I use one eventflag for all four connections and the listening socket.
like this:
#define EFN_TCP 2
which is later used to build the waitMask for the wflor call and to use in the async read call.

I get the impression that this is not the way to do it... please advise on expected behavior and better solutions.

Regards,
Matthijs

John Gillings · ‎12-19-2006

Matthijs,

I'd suggest you build yourself a test harness. Just a simple, single threaded sender and receiver. Instead of async, use a synchronous $QIOW in the receiver in a loop, with timing between messages received. If you can reproduce the delays in that environment you've got something that needs to be reported.

I'd be looking very closely at the logic in your program after the WFLOR. For example, what if a TCP or UDP event occurs while you're off processing a TIMER or BMQ event? Could there be anything else clearing your event flag behind your back? Remember that issuing an operation against an event flag (like a $QIO) will implicitly clear the flag. Similarly, if you're manually $CLREF anywhere, could there be timing windows where you're dismissing an event completion before processing it? (in general, you should NOT need to $CLREF, let the request do that for you).

Typically I'd ignore the event flags themselves for determining event completion. Whenever your WFLOR fires, I'd check the IOSBs (or equivalent) for ALL pending events to see if they've completed. Like this:

LOOP
$WFLOR(all possible events)
FOR each possible event DO
IF event completed THEN
process event
reset event
ENDIF
ENDFOR
ENDLOOP

A crucible of informative mistakes

Robert Gezelter · ‎12-19-2006

Matthijs,

Using a single Event Flag for all of the TCP connections will simply not work. When a QIO(W) is done on a channel, the specified event flag will be cleared. If you have multiple channels all referencing the same event flag, it is not surprising that strange things happen. In short, events are being lost.

IMO, the best way to do this is with Asynchronous System Traps. When each IO operation completes, a call to the AST routine is made. Your code sample already appears to have a data structure associated with each channel. The ASTPRM parameter to the QIO call is the address of this structure. This is used by the AST routine to find its structures. (The IOSB already appears to be in this structure, so you are partway there). Using this approach, there is no need for a WFLOR (at least for the TCP connections), and no need for polling using select.

Alternatively, you could use different event flags for each channel, and continue to use the WFLOR construct. The problem is that Event Flags are a relatively scarce commodity, and it is easy to run out. For getting event flag numbers, I highly recommend the use of the LIB$GET_EF system library call. It is easy to accidentally have overlapping use of an event flag, and it is very difficult to track down.

As I noted earlier today, I have given quite a few presentations on these techniques at past technical seminars. My session "Introduction to OpenVMS AST Programming" can be found at http://www.rlgsc.com/cets/2000/435.html

- Bob Gezelter, http://www.rlgsc.com

Matthijs Boelstra · ‎12-19-2006

Thanks again for all the info, it is very helpful to me.

Because I don't have much time to totally redesign the framework (the costumer is already using the system and I need to fix other functional problems), I want to keep the WFLOR intact.
So if I get everything right I need to use different eventflags for each socket/channel.
And a call to LIB$GET_EF will give me the first free eventflag available. Which I then can assign to my channelstructure when I setup a new channel.

Then when I fall through the WFLOR call (event occurred) I can iterate through my channels (which are stored in a linked list) and find the channel with the matching eventflag.

Please give your feedback.

Regards,
Matthijs

Ian Miller. · ‎12-20-2006

when you get the event flag number check it is in the same cluster as the others you are using as WFLOR requires this.

____________________
Purely Personal Opinion

John Gillings · ‎12-20-2006

Matthijs,

The important thing to remember about event flags is there are very few of them, and they can potentially be used by many other things. In particular, your choice of 0,1,2,3 - 0 is the "default", and the low numbers 1-23 are reserved (see HP OpenVMS Programming Concepts Manual, Table 6-4).

That means you MUST have a secondary, positive means for identifying when one of your specific events has completed. Consider the setting of a particular event flag as telling you that *something* has happened. You then need to figure out what, and remember that it might be several things, or it might be something unrelated to your code which happens to use that event flag, so it's possible you won't find anything at all.

I disagree with Robert Gezelter that "Using a single Event Flag for all of the TCP connections will simply not work". (On the other hand, I absolutely DO agree with Robert that ASTs are a far superior mechanism for dealing with this type of program.)

Sharing flags for multiple events is fine, as long as you use IOSBs, or similar, to positively identify the events that have completed. You also need to consideration for other threads in that you should leave an event flag SET, so you don't leave another thread waiting for an event that's already occurred.

Indeed, you could use the same event flag for ALL your events and just use $WAITFR, rather than $WFLOR. If you allocate that event flag with LIB$GET_EF, you reduce the chances that any other software will mess with your flag. As you overload the event flag, you increase the chances that you'll see spurious sets. Too much and your algorithm becomes polling. You also need to worry about timing windows if you find the need to clear the event flag in order to wait again. For example, if you rely on the $QIO operations to clear the flag, and you use the loop I proposed in my previous response, you'll find that if any other thread sets the flag, the loop becomes a poll. If that happens, change the structure to:

LOOP
$WFLOR(all possible events)
$CLREF(any that are getting set elsewhere)
FOR each possible event DO
IF event completed THEN
process event
reset event
ENDIF
ENDFOR
ENDLOOP

Since you clear the flag BEFORE checking what's completed, you don't need to worry about missing something being set while you're checking.

Ian's comment: "check it is in the same cluster". I'm fairly sure that GET_EF starts at the top of cluster 1 and works downwards, so you probably won't get allocated a flag in cluster 0 until you use up cluster 1. So, either pluck numbers out of the air for ALL your flags (not recommended) OR use GET_EF for all (and check they're in the same cluster).

A crucible of informative mistakes

Robert Gezelter · ‎12-20-2006

John,

WADU, I must disagree. The results of using a single event flag for multiple channels simultaneously is unpredictable.

From the "HP OpenVMS Systems Services Reference Manual: GETUTC-Z", Order #BA554-9006, July 2006 (available in both PDF and HTML from the OpenVMS WWW site at http://www.hp.com/go/openvms ):

"When $QIO begins execution, IT CLEARS THE SPECIFIED EVENT FLAG [emphasis mine], or event flag 0 if the efn parameter is not specified"

This means that if two IO operations complete in "just the right sequence", the second will not be recognized until the NEXT IO operation using that event flag completes. This behavior is quite consistent with the reported behavior.

It is safe to use the same event flag, IFF (IF AND ONLY IF) the multiple channels are not active at the same time.

Since event flags are not needed when using ASTs, this problem does not exist when ASTs are used correctly.

- Bob Gezelter, http://www.rlgsc.com

Robert Gezelter · ‎12-20-2006

Colleages,

An errata in my previous posting.

WADU should have been WADR (With All Due Respect).

- Bob Gezelter, http://www.rlgsc.com

John Gillings · ‎12-21-2006

Re: Robert,

As you've said, ASTs are the way to go.

But it IS possible to use a single event flag for multiple events, as long as you're very careful about the potential timing windows you've pointed out.

Here's one mechanism:

LOOP
$WAITFR(EFN)
$CLREF(EFN)
FOR each possible event DO
IF event completed THEN
process event
reset event (implicitly clears EFN)
$SETEF(EFN)
ENDIF
ENDFOR
ENDLOOP

Although it looks like there's a timing window between the $WAITFR and $CLREF, it's followed by a scan of all events, so anything completing in between will be seen.

This also protects against overloading (both for this thread and any others).

As I said, you end up doing more polling.

A crucible of informative mistakes

Robert Gezelter · ‎12-21-2006

John,

Agreed, it is possible to overload event flags, and then iterate explicitly through the data structures. However, it quickly increases code complexity, and easily creates what appears to be the pathology here, a "lost event".

More often, I have seen this type of problem with libraries that use explicit event flags (e.g., do not use LIB$GET_EF). Accidental collisions are the underlying cause of strange problems. IMHO, using ASTs quickly becomes far safer.

- Bob Gezelter, http://www.rlgsc.com

Matthijs Boelstra · ‎01-28-2007

I planned to write a message earlier, but due to holidays, other activities and implementing and testing the new code my reply is a bit late. But the good news is that I'm very happy with the results.
Now I use different event flags for every socket and after some days of logging I see no more delays. So I have a strong feeling that the problem is solved.
For this I want to thank everyone who has replied and especially Robert Gezelter and John Gillings, thanks again!

Hoff · ‎01-29-2007

If we're having the discussion we are having, I'd start out with unique event flags. From very direct experience, event flags do not scale particularly well, and they can be really nasty when you get to multi-cluster event flags; when you find yourself needing to implement waiting across event flag clusters.

Event flags are a particularly "interesting" construct, lulling innocent programmers into a construction which is understandable and easy and simple, and luring those programmers onto a shoal of difficulty as the application scales upward.

For small-scale and simple constructs, do use them. For moderate scale or re-entrant stuff, stay away. Use ASTs or threads.

The only event flag that I typically use now is the EFN$C_ENF "don't care" event flag.

Do always specify the IOSB or LKSB or similar argument; do not assume that argument is optional. That's key to some low-level I/O synchronization processing, not the least of which is spurious event flag operations. (This is where I expect John G. is going with his comments around mixing event flags. But you can ONLY mix event flags when you have and use and implicitly or explicitly $SYNCH with the event flag AND the IOSB.)

But again, going re-entrant with ASTs or with threads is easier when working with real-time or reactive or event-driving applications.

Stephen Hoffman
HoffmanLabs

Matthijs Boelstra · ‎01-29-2007

I forgot to close the thread I should have put the content of my last message here.
See the last message of the author!

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Delays in incoming IP traffic using QIO

Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO

Re: Delays in incoming IP traffic using QIO