Operating System - HP-UX
1833451 Members
3179 Online
110052 Solutions
New Discussion

RST on Socket Listener caused

 
SOLVED
Go to solution
Laroussi_1
Occasional Advisor

RST on Socket Listener caused

We are facing a new problem with one of our most solicited processes. This process is acting as a server : It owns a listener on which all incoming connections are received. On accept() call, a new channel (internal Structure of comm.) is created and a new socket is opened.

Actually we begin to see some new errors in traces, which are the ENOBUFS, ECONRESET (after calling recv() with MSG_PEEK flag) and a POLLERR (after calling poll()).

We understood that we can have the error ENOBUFS if we call an accept on an already closed connection and our process can continue working without any problem, BUT for the ECONRESET error that we got on the listener, we tried to just ignore this error but at the next iteration after ignoring ECONNRESET on listener, the poll on the same channel led to the setting of POLLERR in revent bit mask indicating “An error has occurred on the file descriptor”

Our first concern, is to enforce our process availability and guarantee its coherence. What we would like to understand :

1 - How could a ReSeT be sent on a listening FD? who could have sent this packet on a listening socket? Any scenario could be set for this situation? And do you think that setting the tcp parameter tcp_fin_wait_2_timeout to 0 (infinite) could avoid sending this RST packet?
2 - Do you have any suggestions concerning how we should deal with the situation we described? We think about a final solution of re-initializing the listener, do you have any hints for possible drawbacks?
14 REPLIES 14
Laroussi_1
Occasional Advisor

Re: RST on Socket Listener caused

Attached are more information on the bundles on the server.

thanks and best regards
Lassaad
Laurent Menase
Honored Contributor

Re: RST on Socket Listener caused

I think the best in your case is to contact hp support with the result of a "tusc -E -v" of the application

Do you mean that you do a recv() from a listening socket?
Laroussi_1
Occasional Advisor

Re: RST on Socket Listener caused

Hi Lorent

Yes, we needed to call recvf with MSG_PEEK flag on all opened socket file descriptors including the listener. do you think that it's the origin of the problem ?

regards
Lassaad
Laurent Menase
Honored Contributor

Re: RST on Socket Listener caused

It is very probable there is no use to make a recv() in a listening socket except mess up the socket layer with inconsistent operation which will make interpretation of message in an inappropriate context.
But this can only be confirmed by hp support.

rick jones
Honored Contributor

Re: RST on Socket Listener caused

Why do you need to call MSG_PEEK in the first place? And indeed, I'm pretty sure that it is undefined or at best poorly defined on a listen endpoint. Frankly, the HP-UX stack should not get bent out of shape if you do it, but at the same time, there is the question of why do it in the first place.

To your point 1, the ReSet isn't sent on a listening FD, it was sent by the remote client, presumably for some connection still queued to the listen endpoint. tcp_fin_wait_2_timeout would have nothing to do with that.

If you reinitialize the listener - which I take to mean you close the listen endpoint and make another set of socket/bind/listen calls, the downside is that connections queued to the listen endpoint when you close() it will be dropped, and there will be a window between the close() call and the subsequent listen() call where connection attempts by clients will fail. If the clients can deal with that, it may not be a very large drawback.
there is no rest for the wicked yet the virtuous have no pillows
Laurent Menase
Honored Contributor

Re: RST on Socket Listener caused

What I say is in any case this mess up socket layer, and is this a bug or strange feature can only be defined through hp support.


This is what we call a grey zone. Nothing is defined on what should occur when doing a recv() on an accepting socket. It just mess up that socket, and not all the sockets.
Is this out of spec or should be fixed can only be defined and discussed with HP support.
11.31 shouldn't have this behavior.
Laroussi_1
Occasional Advisor

Re: RST on Socket Listener caused

Hi,

Thanks for you answers, the described problem happened also on one of our customer having 11.31

what we are going to do is to avoid the recv on socket listner and call the accept whatever we have in the buffer, the question is: if we call the accept and in the buffer there's a RST, can we continue calling poll() on this socket listner without having POLLERR?

Regards
Lassaad
Laurent Menase
Honored Contributor

Re: RST on Socket Listener caused

yes the T_DISCON_IND is well managed by accept() in that context, and is not an error for poll().



rick jones
Honored Contributor

Re: RST on Socket Listener caused

I cannot recall if POLLERR is "edge triggered" or "level triggered" to know if there is more than one pending connection that was reset by the remote if a single recv() would clear it.
there is no rest for the wicked yet the virtuous have no pillows
Laurent Menase
Honored Contributor

Re: RST on Socket Listener caused

"
I cannot recall if POLLERR is "edge
triggered" or "level triggered" to
know if there is more than one pending
connection that was reset by the remote
if a single recv() would clear it.
"
in fact it is both, but moreover is is final.

In usual time you don't get a POLLERR on an accepting socket.
- on 11.31 it can be due to a none patched system , since there is a xport patch which fixes some strange errors in accept().
Laroussi_1
Occasional Advisor

Re: RST on Socket Listener caused

Hello

Thank you very much for your comments, but I did not catch the meaning of "edge triggered" or "level triggered".

Actually our application has an infinite loop in which we call poll(), then we call recv() with MSG_PEEK on all file descriptors related to socket and returned by poll() and if no problem we call accept() if the socket is listener.
So when the recv() returned the error ECONNRESET we just ignored it without doing the accept() and thus the next poll returned the file descriptor of the listener (on which we got the RST) but with the revent bit mask with POLLERR.
Based on your recommendation we updated the application by avoiding the call of recv() with MSG_PEEK for file descriptors related to socket listener and we call directly accept().

The other problem is that we are unable to reproduce the issue, we played with tcp parameters, we added breakpoints before the accept in server..etc but never got the RST on the socket listener, the issue happened only on customer machine which are overloaded. So any clue to duplicate this RST and test the modification will be very much appreciated.

Best regards
Lassaad
rick jones
Honored Contributor

Re: RST on Socket Listener caused

You could potentially simulated the overload condition by sending a SIGSTOP to the application for a brief while while connection attempts to it continue. That should allow the queue to fill and then depending on the client behaviour you will get the closes with either the FIN or the RST.

"Edge triggered" vs "level triggered" might best be addressed via a web search :)
there is no rest for the wicked yet the virtuous have no pillows
Laroussi_1
Occasional Advisor

Re: RST on Socket Listener caused

Hi Rick

We finally succeeded to reproduce the issue with the scenario that you indicated (thank you !), but only in 5% percent cases and not systematically.

We installed also wireshak to control the packet types exchanged and as you said when the queue is full there are systematically the RST packets, but when we unfreeze our application it did not handle this packet in all cases (just 5% of all cases), it seems like the RST packet has a very small life time and we should unfreeze our application before that time expires, I am may be wrong in my observations, but can you please help me to make this tests systematic.

Thanks and best regards
Lassaad
rick jones
Honored Contributor
Solution

Re: RST on Socket Listener caused

The RSTs were coming from the clients yes? Indeed an RST is just a "one shot" sort of segment, it is never retransmitted. It signifies that the side sending the RST has detected something it thinks was so heinous that it could not in good conscience continue to exist as a TCP connection :)

As for how to make it more systematic, that really depends on more things than I can think of at this time. You might find some way to shorten the length of time before the client abandons all hope. How will be client-stack-specific.
there is no rest for the wicked yet the virtuous have no pillows