<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: FIN_WAIT_2 / CLOSE_WAIT in Operating System - HP-UX</title>
    <link>https://community.hpe.com/t5/operating-system-hp-ux/fin-wait-2-close-wait/m-p/3909896#M284364</link>
    <description>I agree that it's probably an application bug, but all too often we end up fixing application bugs with band-aids on the system....&lt;BR /&gt;&lt;BR /&gt;Have you experimented (carefully) with the tcp_fin_wait_2_timeout parameter? It's specific to the FIN_WAIT_2 state, so it probably won't help you with CLOSE_WAIT. But I think both of those could be caused by the same kind of application error on opposite ends of the connection.&lt;BR /&gt;</description>
    <pubDate>Fri, 08 Dec 2006 16:17:38 GMT</pubDate>
    <dc:creator>Heironimus</dc:creator>
    <dc:date>2006-12-08T16:17:38Z</dc:date>
    <item>
      <title>FIN_WAIT_2 / CLOSE_WAIT</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/fin-wait-2-close-wait/m-p/3909895#M284363</link>
      <description>Backup software EMC (Legato) Networker is running on a rx4640 / HP-UX 11.23. Every 5th or 6th day we see the message "Too many open files" in the Networker logfile. No more Backups are possible. We have to restart Networker. &lt;BR /&gt;After restart we see less sockets in CLOSE_WAIT. After 5 or 6 days running we see more than 2000 sockets in CLOSE_WAIT / FIN_WAIT_2. What we find out is that there are more than 1000 socket pairs. One connection in CLOSE_WAIT the other in FIN_WAIT_2 state. All these sockets are open by only one user process (nsrjobd).&lt;BR /&gt;tcp 0 0 localhost.50002 localhost.50001&lt;BR /&gt;FIN_WAIT_2&lt;BR /&gt;tcp 0 0 localhost.50001 localhost.50002 CLOSE_WAIT&lt;BR /&gt;..............&lt;BR /&gt;tcp 0 0 localhost.50621 localhost.50620 FIN_WAIT_2&lt;BR /&gt;tcp 0 0 localhost.50620 localhost.50621 CLOSE_WAIT&lt;BR /&gt;................&lt;BR /&gt;We changed the following parameters &lt;BR /&gt;tcp_time_wait_interval 60000&lt;BR /&gt;tcp_conn_request_max 4096&lt;BR /&gt;tcp_ip_abort_interval 60000&lt;BR /&gt;tcp_keepalive_interval 900000&lt;BR /&gt;but this helps nothing.&lt;BR /&gt;We belive that there is a application bug but Legato Support is at a loss.&lt;BR /&gt;Are there any ideas what we can do to close these sockets.</description>
      <pubDate>Fri, 08 Dec 2006 06:54:54 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/fin-wait-2-close-wait/m-p/3909895#M284363</guid>
      <dc:creator>Guenter Lehmann</dc:creator>
      <dc:date>2006-12-08T06:54:54Z</dc:date>
    </item>
    <item>
      <title>Re: FIN_WAIT_2 / CLOSE_WAIT</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/fin-wait-2-close-wait/m-p/3909896#M284364</link>
      <description>I agree that it's probably an application bug, but all too often we end up fixing application bugs with band-aids on the system....&lt;BR /&gt;&lt;BR /&gt;Have you experimented (carefully) with the tcp_fin_wait_2_timeout parameter? It's specific to the FIN_WAIT_2 state, so it probably won't help you with CLOSE_WAIT. But I think both of those could be caused by the same kind of application error on opposite ends of the connection.&lt;BR /&gt;</description>
      <pubDate>Fri, 08 Dec 2006 16:17:38 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/fin-wait-2-close-wait/m-p/3909896#M284364</guid>
      <dc:creator>Heironimus</dc:creator>
      <dc:date>2006-12-08T16:17:38Z</dc:date>
    </item>
    <item>
      <title>Re: FIN_WAIT_2 / CLOSE_WAIT</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/fin-wait-2-close-wait/m-p/3909897#M284365</link>
      <description>FIN_WAIT_2 means that end of the connection has sent a FINished segment, and it has been ACKnowledged by the "remote" TCP.  This end of the connection is now waiting for a FIN from the remote, hence FIN_WAIT_2 (FIN_WAIT_1 is when we are waiting for an ACK of our FIN)&lt;BR /&gt;&lt;BR /&gt;When the FINished segment arrived, the socket associated with that end of the connection would have become "readable" and a read/recv against the socket would have returned zero to indicate to the application that the remote had said (at least) it would be sending no more data.&lt;BR /&gt;&lt;BR /&gt;Unless the connection is supposed to remain up as a "simplex" (unidirectional to the end which sent the FIN), the next logical step is for the application to call close.  Hence this side goes into CLOSE_WAIT state - we are waiting for this side to call close().&lt;BR /&gt;&lt;BR /&gt;So, 99 times out of ten what happens is either the application has "ignored" or "forgotten" the read return of zero, or it has forked and forgotten to clean-up a dangling file descriptor reference.&lt;BR /&gt;&lt;BR /&gt;The FIN_WAIT_2 timer is a massive kludge.  99 times out of ten I wish it wasn't there because it is used to cover the backside of fundamentally broken applications which have bugs which never should have left the lab.&lt;BR /&gt;&lt;BR /&gt;If you want to close the sockets, kill the processes.&lt;BR /&gt;&lt;BR /&gt;FWIW, none of the original ndd settings in the base post would have any effect on this - tcp_time_wait_interval is just for TIME_WAIT, tcp_conn_request_max control the max depth of a listen queue, tcp_ip_abort_interval is for how long we wait for an ACK of data, and tcp_keepalive_interval is just for sockets that set SO_KEEPAIVE.  There is tcp_keepalive_detached_interval, but that is for catching situations where we are in FIN_WAIT_2 and the remote connection is just _gone_ not simply sitting in CLOSE_WAIT.&lt;BR /&gt;&lt;BR /&gt;So, hold Legato's feet to the fire and make them find and fix what is 99% likely to be their bug. If you want to try to "catch" it, you could consider starting to take tusc traces - although doing so from startup could result in some rather long trace files... To be complete, there is a &amp;lt; 1% chance it is a bug in the stack failing to notify, but the chances of that are epsilon.</description>
      <pubDate>Mon, 11 Dec 2006 12:37:32 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/fin-wait-2-close-wait/m-p/3909897#M284365</guid>
      <dc:creator>rick jones</dc:creator>
      <dc:date>2006-12-11T12:37:32Z</dc:date>
    </item>
  </channel>
</rss>

