HPE GreenLake Administration
- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- FIN_WAIT_2 / CLOSE_WAIT
Operating System - HP-UX
1826615
Members
2873
Online
109695
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-07-2006 10:54 PM
12-07-2006 10:54 PM
FIN_WAIT_2 / CLOSE_WAIT
Backup software EMC (Legato) Networker is running on a rx4640 / HP-UX 11.23. Every 5th or 6th day we see the message "Too many open files" in the Networker logfile. No more Backups are possible. We have to restart Networker.
After restart we see less sockets in CLOSE_WAIT. After 5 or 6 days running we see more than 2000 sockets in CLOSE_WAIT / FIN_WAIT_2. What we find out is that there are more than 1000 socket pairs. One connection in CLOSE_WAIT the other in FIN_WAIT_2 state. All these sockets are open by only one user process (nsrjobd).
tcp 0 0 localhost.50002 localhost.50001
FIN_WAIT_2
tcp 0 0 localhost.50001 localhost.50002 CLOSE_WAIT
..............
tcp 0 0 localhost.50621 localhost.50620 FIN_WAIT_2
tcp 0 0 localhost.50620 localhost.50621 CLOSE_WAIT
................
We changed the following parameters
tcp_time_wait_interval 60000
tcp_conn_request_max 4096
tcp_ip_abort_interval 60000
tcp_keepalive_interval 900000
but this helps nothing.
We belive that there is a application bug but Legato Support is at a loss.
Are there any ideas what we can do to close these sockets.
After restart we see less sockets in CLOSE_WAIT. After 5 or 6 days running we see more than 2000 sockets in CLOSE_WAIT / FIN_WAIT_2. What we find out is that there are more than 1000 socket pairs. One connection in CLOSE_WAIT the other in FIN_WAIT_2 state. All these sockets are open by only one user process (nsrjobd).
tcp 0 0 localhost.50002 localhost.50001
FIN_WAIT_2
tcp 0 0 localhost.50001 localhost.50002 CLOSE_WAIT
..............
tcp 0 0 localhost.50621 localhost.50620 FIN_WAIT_2
tcp 0 0 localhost.50620 localhost.50621 CLOSE_WAIT
................
We changed the following parameters
tcp_time_wait_interval 60000
tcp_conn_request_max 4096
tcp_ip_abort_interval 60000
tcp_keepalive_interval 900000
but this helps nothing.
We belive that there is a application bug but Legato Support is at a loss.
Are there any ideas what we can do to close these sockets.
guenter
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2006 08:17 AM
12-08-2006 08:17 AM
Re: FIN_WAIT_2 / CLOSE_WAIT
I agree that it's probably an application bug, but all too often we end up fixing application bugs with band-aids on the system....
Have you experimented (carefully) with the tcp_fin_wait_2_timeout parameter? It's specific to the FIN_WAIT_2 state, so it probably won't help you with CLOSE_WAIT. But I think both of those could be caused by the same kind of application error on opposite ends of the connection.
Have you experimented (carefully) with the tcp_fin_wait_2_timeout parameter? It's specific to the FIN_WAIT_2 state, so it probably won't help you with CLOSE_WAIT. But I think both of those could be caused by the same kind of application error on opposite ends of the connection.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-11-2006 04:37 AM
12-11-2006 04:37 AM
Re: FIN_WAIT_2 / CLOSE_WAIT
FIN_WAIT_2 means that end of the connection has sent a FINished segment, and it has been ACKnowledged by the "remote" TCP. This end of the connection is now waiting for a FIN from the remote, hence FIN_WAIT_2 (FIN_WAIT_1 is when we are waiting for an ACK of our FIN)
When the FINished segment arrived, the socket associated with that end of the connection would have become "readable" and a read/recv against the socket would have returned zero to indicate to the application that the remote had said (at least) it would be sending no more data.
Unless the connection is supposed to remain up as a "simplex" (unidirectional to the end which sent the FIN), the next logical step is for the application to call close. Hence this side goes into CLOSE_WAIT state - we are waiting for this side to call close().
So, 99 times out of ten what happens is either the application has "ignored" or "forgotten" the read return of zero, or it has forked and forgotten to clean-up a dangling file descriptor reference.
The FIN_WAIT_2 timer is a massive kludge. 99 times out of ten I wish it wasn't there because it is used to cover the backside of fundamentally broken applications which have bugs which never should have left the lab.
If you want to close the sockets, kill the processes.
FWIW, none of the original ndd settings in the base post would have any effect on this - tcp_time_wait_interval is just for TIME_WAIT, tcp_conn_request_max control the max depth of a listen queue, tcp_ip_abort_interval is for how long we wait for an ACK of data, and tcp_keepalive_interval is just for sockets that set SO_KEEPAIVE. There is tcp_keepalive_detached_interval, but that is for catching situations where we are in FIN_WAIT_2 and the remote connection is just _gone_ not simply sitting in CLOSE_WAIT.
So, hold Legato's feet to the fire and make them find and fix what is 99% likely to be their bug. If you want to try to "catch" it, you could consider starting to take tusc traces - although doing so from startup could result in some rather long trace files... To be complete, there is a < 1% chance it is a bug in the stack failing to notify, but the chances of that are epsilon.
When the FINished segment arrived, the socket associated with that end of the connection would have become "readable" and a read/recv against the socket would have returned zero to indicate to the application that the remote had said (at least) it would be sending no more data.
Unless the connection is supposed to remain up as a "simplex" (unidirectional to the end which sent the FIN), the next logical step is for the application to call close. Hence this side goes into CLOSE_WAIT state - we are waiting for this side to call close().
So, 99 times out of ten what happens is either the application has "ignored" or "forgotten" the read return of zero, or it has forked and forgotten to clean-up a dangling file descriptor reference.
The FIN_WAIT_2 timer is a massive kludge. 99 times out of ten I wish it wasn't there because it is used to cover the backside of fundamentally broken applications which have bugs which never should have left the lab.
If you want to close the sockets, kill the processes.
FWIW, none of the original ndd settings in the base post would have any effect on this - tcp_time_wait_interval is just for TIME_WAIT, tcp_conn_request_max control the max depth of a listen queue, tcp_ip_abort_interval is for how long we wait for an ACK of data, and tcp_keepalive_interval is just for sockets that set SO_KEEPAIVE. There is tcp_keepalive_detached_interval, but that is for catching situations where we are in FIN_WAIT_2 and the remote connection is just _gone_ not simply sitting in CLOSE_WAIT.
So, hold Legato's feet to the fire and make them find and fix what is 99% likely to be their bug. If you want to try to "catch" it, you could consider starting to take tusc traces - although doing so from startup could result in some rather long trace files... To be complete, there is a < 1% chance it is a bug in the stack failing to notify, but the chances of that are epsilon.
there is no rest for the wicked yet the virtuous have no pillows
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Support
Events and news
Customer resources
© Copyright 2025 Hewlett Packard Enterprise Development LP