Operating System - HP-UX
1830124 Members
25989 Online
109998 Solutions
New Discussion

tcp_keepalive_interval and Serviceguard

 
SOLVED
Go to solution
SILO Storagetek
Frequent Advisor

tcp_keepalive_interval and Serviceguard

Hi everybody,
I have a problem with an application on a 11.0, that is a lot of CLOSE_WAIT status connections.
Can I set tcp_keepalive_interval (now 7200000) in 30 minutes, to kill the CLOSE_WAIT sessions?
If I can, serviceguard (11.09) may have problems?

thanks
prod.
2 REPLIES 2
Laurent Menase
Honored Contributor
Solution

Re: tcp_keepalive_interval and Serviceguard

Hi,

the tcp_keepalive_interval will have an effect on those connections only if the keepalive socket option had been set ( which is not the default) and the peer reset its connection.
It will have no effect if the peer is in FIN_WAIT_2 state.

If the peer is still in fin_wait_2, you should wonder why the application do not close those connections

rick jones
Honored Contributor

Re: tcp_keepalive_interval and Serviceguard

CLOSE_WAIT means the application(s) you are running are buggy. The remote systems have sent that system FIN segments, and that system has sent ACK's of those FINs. The stack has notified the application by making the socket readable and returning a value of zero when/if the application calls recv/read/recvmsg on the socket.

That the connections are still in CLOSE_WAIT implies the application(s) have not called close() or shutdown() - either because they are not looking to see the socket become readable, or because they did, and did not call read/recv/recvmsg, or they did call read/recv/recvmsg and misinterpreted the return value of zero.

Now, since this side has not sent a FIN, the other end of the connection is (likely) in FIN_WAIT_2. All keepalive probes will do in this instance is verify that the remote is indeed still there, waiting patiently for the buggy app on this system to finally call close. A keepalive probe will only cull a CLOSE_WAIT connection if the remote has finally gotten disgusted and aborted (RST is sent) and that RST was not seen by the local system. The keepalive probes will then elicit more RST's which will nuke it.

And yes, as pointed-out, tcp_keepalive_interval only comes into play for those applications that have called setsockopt() to enable SO_KEEPALIVE. There is no way with ndd variables to enable keepalives by default.

Bottom line: get the application(s) fixed. Anything else is just a massive kludge.
there is no rest for the wicked yet the virtuous have no pillows