Operating System - HP-UX
1833188 Members
2962 Online
110051 Solutions
New Discussion

Re: close() socket is stuck in FIN_WAIT_2 state after all settings

 
Lucy_6
New Member

close() socket is stuck in FIN_WAIT_2 state after all settings

Hi to all,

we have a folowing problem:

a server that have still open connections, try to close() them before shutting down, but if the clients are still alive, server wait forever.

we installed all needed patches for setting /dev/tcp parameters,
we set tcp_fin_wait_2_timeout to 15000,
tcp_keepalive_detached_interval to 10000
tcp_ip_abort_interval to 30000

server is still stuck in close() function, trying to close a first connection.

What else i have to do?

Does tcp_keepalive_interval also influence on sockets' close() operation?

Thanks in advance :-).


6 REPLIES 6
James R. Ferguson
Acclaimed Contributor

Re: close() socket is stuck in FIN_WAIT_2 state after all settings

Hi Lucy:

I belive you should set the 'tcp_keepalive_interval' too.

From Knowledge Base documents #S1100002433A & S1100002433B:

"The 'tcp_keepalive_interval' parameter determines the amount of time that TCP waits for an idle connection with no unacknowledged
data before sending keepalive packets. The default is 2 hours (7200000)."

Regards!

...JRF...

Lucy_6
New Member

Re: close() socket is stuck in FIN_WAIT_2 state after all settings

Thank you very much, James.

I think now i have some other problem, because
my server is hung in close() call (in _close_sys() from /lib/libc.2 - that is what i am seing in debugger) and the socket is still in ESTABLISHED state (i saw it executing netstat -an).

It is very strange, isn't it?

Does somebody know what is going on there?

And again, thanks for the help.
Chris De Angelis
Frequent Advisor

Re: close() socket is stuck in FIN_WAIT_2 state after all settings

Lucy, I am having the same problem as you are. I inherited a program that makes a call to close() to close a socket and then that thread is hung in _close_sys() and never returns.

I thought this might be a problem resolvable with newer patches than I have now, so I found patch PHNE_25423, which has some fixes for issues with close(), but that plus its dependencies did not make my problem go away. I can't say I'm surprised, however, as the conditions described in that patch release do not apply in my situation, as far as I can tell.

Some more details of the situation I'm having trouble with: the main thread of the program blocks on accept(). When that call returns, a new thread is spun off to deal with the incoming request and the loop of the main thread goes back to calling accept. Now if I want to kill the process by issuing the "kill" command (which sends the process the SIGTERM signal), the signal handler registered for SIGTERM is run and gets as far as calling close(), and that's as far as it goes. When I attached with the debugger, I saw that this was being done in a different thread, which was just sitting in _close_sys().

We are trying to reproduce this scenario in a smaller test program to see whether we get the same thing. In the mean time, I am open to other suggestions, especially since I am not a socket programming guru!
Lucy_6
New Member

Re: close() socket is stuck in FIN_WAIT_2 state after all settings

Hi :-),

actually i solved the hunging in close() by just killing a thread (that is responsible for send/recv on that socket) before calling to close() socket function.

That means you should have socket information and it's thread id information available in some "shutdown" thread.

I can't explain successful close() after killing the thread, but i think it can be because thread releasing some resources of the socket(?) or send some acknowledgement signal( to it's own socket? or to client?).

i'd like to know what is the exact reason, so any guess or explanation is welcomed.

i hope it will help you, too.

Lucy.

Chris De Angelis
Frequent Advisor

Re: close() socket is stuck in FIN_WAIT_2 state after all settings

We have been able to reproduce the hanging close() behavior in a multi-threaded test program where the main thread is in a loop to accept() incoming calls on the socket. The thing is if you _never_ come out of the accept() , then the close() from the other thread will work, but after the first client is connected to the socket via an accept() and then you call it again, the accept() can never be woken up by a close().

Killing the blocking thread explicitly gets you (and us as well) out of the immediate problem, but is a very nasty solution.

We don't see what we're doing wrong here, so as soon as we get a chance we're going to open a call to HP about this, as this behavior does not occur on other Unix systems. I'll post here again after we've talked to HP.

Regards,
Chris
Stefan Farrelly
Honored Contributor

Re: close() socket is stuck in FIN_WAIT_2 state after all settings


Youve got 2 options to clear your stuck socket connections;

1. Use the ndd command to forceably kill them.
2. Find the IP of the remote connection - and get it rebooted (usually someones PC so no problem).

Here is how to use ndd to kill them;

To use the ndd -set /dev/tcp tcp_discon, you need the pointer to the TCP
instance data. You can
retrieve this via the ndd command tcp_status.

So, the scenario to find the TCP instance data and then use tcp_discon to remove
the instance is as
follows:

# ndd -get /dev/tcp tcp_status
TCP dest snxt suna swnd cwnd rnxt rack rwnd rto mss [lport,fport] state

0183b8b4 015.043.233.086 533cb8ce 533cb8ce 00008000 00003000 533bc583 533bc583
00000000 02812 04096 [c00a,cea9] TCP_CLOSE_WAIT

So, if you wanted to remove this connection:
# ndd -set /dev/tcp tcp_discon 0x0183b8b4

If you want to use the tcp_discon_by_addr, you use a 24 byte string that
contains the hex
representation of the quadruple.

For example, if the connection that I want to delete is:

Local IP: 192.1.2.3 (0xc0010203)
Local Port: 1024 (0x0400)
Remote IP : 192.4.5.6 (0xc0040506)
Remote Port: 2049 (0x0801)

The "hex" string you pass to tcp_discon_by_addr is:

# ndd -set /dev/tcp tcp_discon_by_addr "c00102030400c00405060801"

NOTE: the preceding 0x that typically indicates a Hex number is NOT part of the
string passed.
Im from Palmerston North, New Zealand, but somehow ended up in London...