Operating System - OpenVMS
1752340 Members
5963 Online
108787 Solutions
New Discussion

Re: Socket communication problem

 
SOLVED
Go to solution
dschwarz
Frequent Advisor

Socket communication problem

I am playing around with TCP socket communication using $QIO and ASTs.
My program looks like this:

- initialize local socket
- connect to remote system
- set up $QIO(READVBLK) triggering ReadAst on completion
- build message to send

- send the message using this code snippet
rtc = sys$qio ( EFN$C_ENF,
WriteAstParam.channel,
IO$_WRITEVBLK,
WriteAstParam.iosb,
(void *)&TcpWrittenAst,
0,
WriteAstParam.buf,
WriteAstParam.buflen,
0, 0, 0, 0 );

on completion of this IO, TcpWrittenAst is triggered,
where I check the IOSB (it is global), both status and count.

This works fine as long as the physical connection (cable) exists,
but after removing the cable to the remote peer, and trying to
send the next message, TcpWrittenAst is triggered immediately
and iosb.status is 1 and iosb.count=number of bytes to be sent.

tcptrace shows, that there is no ACK message from the remote computer,
so the message will never be received by the remote system although
my program "believes" that the message has been sent successfully.

TCP/IP stack retries to send the message until the keepalive mechanism
fires ReadAst, where I get an appropriate iosb.

That's far too late !

Is there a chance to be informed earlier about the existence
of a connection problem ?

This behaviour has been verified on

VAX/VMS v7.3 with TCP/IP v5.3 ECO 4
AXP/VMS v7.3-2 with TCP/IP v5.4 ECO 7
AXP/VMS v8.4 with TCP/IP v5.7 ECO 4
I64/VMS v8.4 with TCP/IP v5.7 ECO 5

 

4 REPLIES 4
Ruslan R. Laishev
Super Advisor

Re: Socket communication problem

Hi!

 

Did you set KEEPALIVE for the socket ?

 

Hein van den Heuvel
Honored Contributor

Re: Socket communication problem

Are you also checking the SYS$QIO immediate return status in 'rtc'?

Testing IOSB in the ast function is the critical part, and the AST will not fire if the SYS$QIO call itself failed, but maybe there is an informational or something?

Is it possible to attach a larger code snippet with some variable and function definitions, ideally post enough for someone to change thet target and run a test, if they have the time and opportunity.

 

Hein

Hoff
Honored Contributor
Solution

Re: Socket communication problem

Welcome to the usual sorts of fun with TCP.   What you're experiencing is expected and unfortunately common misbehavior of the half-open morass.   Your message really was sent, as far as the sender can determine.   Whether or not the message might ever arrive, if/when the disconnect is detected and/or resolved?

For this case, set TCPIP$C_KEEPALIVE on the connection, and — given you're using ASTs and $qio — also have a look at the semi-related TCPIP$C_MSG_NBIO while you're looking at the code.   Also at moving to TLS, given expectations of security and integrity.

Expect to have to get a response from the other end to confirm reception, too — which tends to lead toward the use of Paxos or some other 2PC or 3PC scheme.

Some of the other common pitfalls with TCP and IP networking:

  • TCP doesn't do datagrams, it does streams.   UDP and DTLS do datagrams, though not with guaranteed delivery.
  • Do NOT assume one message sent means one message arriving.  Receivers can be handed anything from one byte to the whole "datagram", and it wouldn't surprise me to receive several "datagrams" together in one big wad, depending on the protocol design.
  • Do NOT assume that sending the message means it will be received.

So as mentioned earlier, if you want to know when the connection drops, then you're going to be using keepalive.

dschwarz
Frequent Advisor

Re: Socket communication problem

Keepalive is enabled.
The immediate return status of sys$qio is checked and is always ok - as expected.

This program works fine for more than 10 years as long as the network connection is not broken.
This does not happen very often at our site, in fact it never did in the last 10 years.
A collegue of mine has found this behaviour during some tests of another piece of software he is working on
and has been concerned about this.

To be sure that the "messages" really reach the recipient, I have to introduce some confirmation layer of my own.

Thank you very much