1822158 Members
3773 Online
109640 Solutions
New Discussion юеВ

FIN_WAIT_2

 
SOLVED
Go to solution
Matt Mumford
Occasional Advisor

FIN_WAIT_2

Hey all,

I see that this area has been talked about but I would like to get a newer response and one that is geared to my companies enviroment. Here is it goes. We are running an N4000 with HPUX 11 and a legacy system call Universe (aka PICK). We are using the Universe ODBC (UNIObjects) clients on a couple windows servers running .Net applications. When I randomly run 'netstat -na | grep 31438' I see tcp connections from one of the specific windows server where the status remains at FIN_WAIT_2. I have read about using a script to clean up these hung FIN_WAIT_2 and also setting the timeout from 0 to 60 minutes. What is the best approach? Is it something in the .Net application that is not closing correctly? Should I set the time out to 60 minutes? Or use the script to hard kill the hung connections? One last thing, we have another server that run .Net applications making a similar connection and it never leaves these.

Matt
10 REPLIES 10
harry d brown jr
Honored Contributor
Solution

Re: FIN_WAIT_2


see this thread: http://forums1.itrc.hp.com/service/forums/parseCurl.do?CURL=%2Fcm%2FQuestionAnswer%2F1%2C%2C0xe0d97680e012d71190050090279cd0f9%2C00.html&admit=716493758+1107893897004+28353475

btw, I programmed in pick (Universe) for 10 years.

and get "lsof", as netstat SUCKS: http://hpux.cs.utah.edu/hppd/hpux/Sysadmin/lsof-4.74/

live free or die
harry d brown jr
Live Free or Die
Gary L. Paveza, Jr.
Trusted Contributor

Re: FIN_WAIT_2

We have had a similiar problem occassionally. I wrote a script which allows us to change the timeout to anything between 0 and 20 seconds (we do a temporary change). The idea is that it's executed with say a 20 second timeout, then we watch netstat for the FIN_WAIT2's to go away, then reset it back to 0. We were advised that we really shouldn't change it permanently.
David Child_1
Honored Contributor

Re: FIN_WAIT_2

Matt,

The sugguestions already given should work well to keep those FIN_WAIT_2s cleaned up. I would recommend trying to identify why the one server is leaving those hanging and the other works fine.

Are both servers running the same application(s)? Are the same users on both servers or are they a different set of users on each server? Perhaps the users on one server are not using it correctly. What about the application revision? Is it the same on both servers?

Just some ideas,
David
Matt Mumford
Occasional Advisor

Re: FIN_WAIT_2

Hey all,

I installed LSOF and when I ran:

lsof -i tcp:31438

I got:

root@tzg # lsof -i tcp:31438
Memory fault(coredump)

Any thoughts? Help.

Matt
rick jones
Honored Contributor

Re: FIN_WAIT_2

I do nothink that anything has changed wrt this issue over the years. Likely as not those windows clients are doing abortive closes and the RST's are lost, or they are ignoring the FIN when the server closes.

I would suggest you first make sure that the FIN_WAIT_2's are hanging around for longer than tcp_keepalive_detached_interval + tcp_ip_abort_interval before you start altering other timer settings. If the server application is calling close(), the connection becomes "detached" (ie has no associated socket) and after tcp_keepalive_detached_interval, keepalive probes will be sent. Those will likely generate RST's from the windows system if the windows system did an abortive close. That will clear the FIN_WAIT_2.

If there is no response, it will keep sending probes for tcp_ip_abort_interval time units.

If the windows system has not called close, the probes will elicit normal ACKs and it indicates you have buggy windows clients that need to be fixed.
there is no rest for the wicked yet the virtuous have no pillows
Matt Mumford
Occasional Advisor

Re: FIN_WAIT_2

Hey all,

I had to just do a fresh compile of LSOF and it appears to be working. I have attached the view of two examples of the problem port. The first is a 'netstat -na | grep 31438' the second in 'lsof -i tcp:31438'.

The ones I am interested in are coming from IP 172.28.8.232 in the 'netstat' view, however I am not seeing the information on the 'lsof'. What am I missing? Help


rick jones
Honored Contributor

Re: FIN_WAIT_2

Matt - my guess is that where netstat shows all TCP endpoints, lsof may only be showing those with an associated socket. If that is the case, it implies that those in netstat but not in lsof output are "detached" - ie the application has called close() on the socket.

As such, the tcp_keepalive_detached_interval stuff should kick-in.

Unless we are talking about hundreds adnd thousands of FIN_WAIT_2 connections it really should not be a big dea for the transport. It has good hashes for connection lookup.
there is no rest for the wicked yet the virtuous have no pillows
Matt Mumford
Occasional Advisor

Re: FIN_WAIT_2

Hey all,


The problem is that those FIN_WAIT_2 have been out there since my last reboot last weekend.

Matt

rick jones
Honored Contributor

Re: FIN_WAIT_2

If they have been there since last weekend, it suggests that the remote endpoints are still alive and responding to the keepalive pings.

(Might want to check that tcp_keepalives_kill is still set to 1)

Again, unless there are thousands of them, it really isn't a big deal - particularly if your server application code is written correctly, setting SO_REUSEADDR before trying to bind() on a restart of the application.
there is no rest for the wicked yet the virtuous have no pillows
Ron Kinner
Honored Contributor

Re: FIN_WAIT_2

Go look on the Windows box and see if it has a bunch of connections stuck in LAST ACK. NT has a bug in tcpip.sys that used to do this all the time. Creates a bunch of FIN_WAIT_2's on the HPUX at the same time. Compare the version of tcpip.sys on the bad box with that of the good box.

http://support.microsoft.com/default.aspx?scid=kb;en-us;254930

I have the fix for NT if you need it.

Ron