1823097 Members
3172 Online
109646 Solutions
New Discussion юеВ

FIN_WAIT_2 states.

 
SOLVED
Go to solution
Marc Dijkstra
Trusted Contributor

FIN_WAIT_2 states.

Good day.
This pertains to Networking/General too...
Running a banking package called BankMaster, which opens a listening port that the clients connect through and get a new port number. If any client workstation reboots or loses a TCP/IP connection then the port will go into a FIN_WAIT_2 state.
Now under HP-UX 11i the FIN_WAIT_2 state has no time out such that the port will persist in that state indefinitely.
Sometimes this can cause the listening port to "die" and then no more clients can connect to BankMaster. The only option is to shut down the cluster and reboot the system or reconfigure BankMaster to have a different listening port.

There is a solution to the FIN_WAIT_2 timeout problem in that we can use the ndd command to set a timeout value of say 10 minutes for FIN_WAIT_2. My only concern is that when i look up the TCP/IP states using netstat -an command, I notice that the links for the cluster network i.e the 2 HEART BEAT links are also in a FIN_WAIT_2 state. Now if I introduce this timeout, will this not affect the MC/Service Guard clustering ?

MND
"A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila"
7 REPLIES 7
Rainer von Bongartz
Honored Contributor

Re: FIN_WAIT_2 states.

Are you sure that your heart-beat links are in a correct state ??

on my boxes the state for the services hacl-hb is ESTABLISHED ( and according to my knowledge this is what they should be)

The starte for hacl-cfg might be different but this should not matter.


Regards
Rainer
He's a real UNIX Man, sitting in his UNIX LAN making all his UNIX plans for nobody ...
Eugen Cocalea
Respected Contributor

Re: FIN_WAIT_2 states.

Eugen Cocalea
Respected Contributor

Re: FIN_WAIT_2 states.

Hi again,

By the way, search the forums for FIN_WAIT_2, there are several interesting threads about the same problem.

E.
To Live Is To Learn
Stefan Farrelly
Honored Contributor
Solution

Re: FIN_WAIT_2 states.

We too have a dodgy application which sometimes leaves a TCP port open on the HP.

You have some options;

1. Tell the application providers to modify their application to handle trying to assign a port thats already in use (left over) - a simple code change can reuse a TCP port instead of their app dying with Port already in use ! (not much chance of you getting them to do this I suspect)

2. A script from HP is below which kills hung TCP sockets. Weve been using it a bit, works fine. Run it once and it produces a list of hung ports (silently), run it a second time and it kills them (and displays those killed).

3. Modify your tcp settings to allow hung sockets to disconnect faster. Weve set tcp_fin_wait_2_timeout to 15 seconds (!) and weve had no problems. We also run it on an SG cluster - and again, no problems. Heres our changes in /etc/rc.config.d/nddconf

TRANSPORT_NAME[0]=tcp
NDD_NAME[0]=tcp_ip_abort_interval
NDD_VALUE[0]=30000

TRANSPORT_NAME[1]=tcp
NDD_NAME[1]=tcp_ip_abort_cinterval
NDD_VALUE[1]=15000

TRANSPORT_NAME[2]=tcp
NDD_NAME[2]=tcp_fin_wait_2_timeout
NDD_VALUE[2]=15000

4. Install the latest ARPA patch (this helps FIN_WAIT_2 timeouts). Were using PHNE_22397 (for 11.0 - you will want the equivalent for 11i).

Heres the stuck TCP socket kill script from HP's WTEC;

#!/bin/ksh
# Hewlett-Packard Corporation
# This script is UNSUPPORTED. Use at own risk.
# @(#)$Revision: 1.3 $ $Author: scotty $ $Date: 98/08/25 17:55:01 $
#
# This script will query the system for any TCP connections that
# are in the FIN_WAIT_2 state and forcibly disconnect them. It
# uses netstat(1) to find the FIN_WAIT_2 connections and calls
# ndd with the correct hexidecimal representation of the connection
# to close the connection.
#

#
# Temporary files used to compare netstat output
#
MYSCRIPTNAME=${0##*/}
TMPFILE1=/var/tmp/$MYSCRIPTNAME.1
TMPFILE2=/var/tmp/$MYSCRIPTNAME.2

#
# Create a log file to keep track of connection that were removed
#
LOGFILE=/var/adm/$MYSCRIPTNAME.log


function getFinWait2 {

/usr/bin/printf "%.2x%.2x%.2x%.2x%.4x%.2x%.2x%.2x%.2x%.4x\n" $(/usr/bin/netstat -an -f inet | /usr/bin/grep FIN_WAIT_2 | /usr/bin/awk '{print $4,$5}' | /usr/bin/sed 's/\./ /g') > $TMPFILE1
}

function compareFinWait2 {

FIRST_TIME=1

cp $TMPFILE1 $TMPFILE2
getFinWait2

comm -12 $TMPFILE1 $TMPFILE2 | while read CONN
do
if [[ $CONN != "000000000000000000000000" ]]
then

if [ $FIRST_TIME -eq 1 ]
then
print >> $LOGFILE
date >> $LOGFILE
FIRST_TIME=0
fi

print "/usr/bin/ndd -set /dev/tcp tcp_discon_by_addr
\"$CONN\""
>> $LOGFILE
/usr/bin/ndd -set /dev/tcp tcp_discon_by_addr $CONN
fi
done

getFinWait2
}

#
# Main
#

touch $TMPFILE1
touch $TMPFILE2

compareFinWait2
Im from Palmerston North, New Zealand, but somehow ended up in London...
Marc Dijkstra
Trusted Contributor

Re: FIN_WAIT_2 states.

Thanks Stefan -- I shall give this a try (oh and thanks for all input from the guru's)

I have installed the ARPA (in fact the bankmaster that we installed had an addendum that you should install the ARPA patch -- for 9.04!!!) but shall try the rest....

Cheers
MND
"A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila"
rick jones
Honored Contributor

Re: FIN_WAIT_2 states.

as a personal statement, i would strongly suggest that people NOT use the script to kill connections. frankly, it is something that should never have escaped the lab.

there is indeed a timer that is enabled automagically when an app calls close - this is the timer controleed via the tcp_keepalive_detached_interval. when the local app calls close() (and thus "detaching" the TCP endpoint) TCP will start sending keepalive probes after tcp_keepalive_detached_interval, and will continue until it either gets a RST in response, or it has tried sending the probes for tcp_ip_abort_interval timeunits.

there is also the arbitrary timer provided in some patches, but I do not like it very much. FIN_WAIT_2 is also a perfectly valid recieve-only state for a TCP connection. you would get there by having an app call shutdown(SHUT_WR). this is not a "detached" TCP state (close has not been called) so tcp_keepalive_detached_interval does not apply. i have no idea though if MC/SG connections are such connections. I would check the other system(s) - the only way the connection could still be a valid reciev-ony connection is if the other endpoint still exists and is in the CLOSE_WAIT state. even that is not a full confirmation - it could be that the remote app ignored the close indication it was sent. that though is an app bug, and things like the disconnect script should not be used to "fix" app bugs.
there is no rest for the wicked yet the virtuous have no pillows
Ron Kinner
Honored Contributor

Re: FIN_WAIT_2 states.

Microsoft admitted yesterday that they have a bug in WinNT 4.0 which forgets to resend a lost FIN packet. Leaves their end in LAST_ACK (and our HP end in FIN_WAIT_2). See their article:Q254930 There is a patch but you have to beg them for it as it has not yet been released.

Ron Kinner