Operating System - HP-UX
1832695 Members
3149 Online
110043 Solutions
New Discussion

Re-using a port # when the previous connection is in TIME_WAIT

 
Rita Sekhon
Occasional Advisor

Re-using a port # when the previous connection is in TIME_WAIT

We have observed a problem when a process issues a connect re-using a port # for which there is an existing connection in TIME_WAIT.

Here is what the process does:
- gets a port # (e.g. 646)
- binds to the port (no addr in use error)
- calls connect. This request succeeds and the connection is established. The connection is to another process on the local system via inetd.
- close connection
- connecton goes into TIME_WAIT as shown by the netstat -a cmd

tcp 0 0 blaze.bpxx blaze.646 TIME_WAIT

- ....
- the above sequence is repeated. This time if the same port is used i.e. 646 and the previous connection is in TIME_WAIT, the connect hangs and times out after a minute or so. The errno is 238 (ETIMEDOUT).
During this time the netstat -a shows
tcp 0 0 blaze.bpxx blaze.646 TIME_WAIT
tcp 0 1 blaze.646 blaze.bpxx SYN_SENT
- if we retry connect w/ a diff port #, the connect succeeds.
- if the connect is issued for port # 646 (e.g.) and there is no previous connection for port# 646 in TIME_WAIT, the connect succeeds.

We had started inted w/ -l option. The logs msgs in syslog did not show any connect request implying that the connect that timed out never made it to inetd.

I think there is a problem in the transport layer in dealing w/ a new connection w/ the same source and destination address as a previous connection in TIME_WAIT.

This is a solid failure. We are running HP_UX B.11.00 on 9000/800 machine.
8 REPLIES 8
Ron Kinner
Honored Contributor

Re: Re-using a port # when the previous connection is in TIME_WAIT

Standard TIME_WAIT timeout on 11.0 is 60 seconds. Supposedly this is there so that if duplicate packets arrive from your first connection they will not be mistaken for info for the new connection. You can play with it using NDD but Rock Jones will come on and shake his finger at me.

If this is your code then you may want to look at:

http://www.softlab.ntua.gr/facilities/documentation/unix/unix-socket-faq/unix-socket-faq-4.html#ss4.1 and also the preceding pages.

Especially the section on SO_REUSEADDR.

Ron
Steven Gillard_2
Honored Contributor

Re: Re-using a port # when the previous connection is in TIME_WAIT

The behaviour you are seeing is perfectly normal and is there for a very good reason. The whole purpose of the TIME_WAIT state is to mop up any delayed segments that arrive after the connection is closed - hence the SYN gets dropped and your connection times out. If the OS allowed an application to reuse this address straight away, and another segment arrived for the previous connection, then you would get data corruption.

For the client code calling connect(), its normal not to fix the port number with a call to bind(), but to allow the OS to select a random unused port for the client side of the connection. Obviously the servers port is fixed and specified in the connect call.

SO_REUSEADDR won't work in your case - in fact it looks like you're already using it otherwise bind() would have failed with address already in use. Usually this is only used in server code when setting up a listen socket.

You're going to have to re-think your strategy here, because this is the way TCP was designed. Hope this makes sense.

Regards,
Steve
Rita Sekhon
Occasional Advisor

Re: Re-using a port # when the previous connection is in TIME_WAIT

We are not using the SO_REUSEADDR option on the bind to the source addr.

My understanding was that since the SO_REUSEADDR is not specified, the bind call must prevent the process from binding to a port that it thinks is still in TIME_WAIT.
But if the bind succeeds the connect should also work.
Note that on HP the TIME WAIT INTERVAL is 60 secs. Whereas on SUN it is 4 min and we do not have the same problem on SUN inspite of he larger TIME WAIT INTERVAL.

The correct behaviour would be to not allow the bind or be able handle the connect.

The fact that the new connection is in SYN_SENT means that an attempt was made to initiate the connection by the sender. But the receiving end discarded it silently. It has been a while since I worked on the transport layer so I don't have any specifics. But I think something is not woring correctly. If we had seen this problem on all platforms I would believe it. But the same code runs fine on all other platforms (SUN, LINUX, AIX). We see this problem only on HP.
Steven Gillard_2
Honored Contributor

Re: Re-using a port # when the previous connection is in TIME_WAIT

Hmmm, I agree that bind() should return EADDRINUSE in this case if you're not using SO_REUSEADDR. In fact according to the connect() man page it should fail as well:

"If s is a SOCK_STREAM socket that is bound to the same local address as another SOCK_STREAM socket, connect() returns [EADDRINUSE] if addr is the same as the peer address of that other socket. This situation can only happen if the SO_REUSEADDR option has been set on s, which is an AF_INET socket (see getsockopt(2) )."

Before I confuse myself any further, I suggest you get the latest ARPA transport patch, lan products patch and your appropriate lan driver patch installed, then log a call with HP preferably with a simple test case to demonstrate the problem.

Out of interest what is the behaviour on other OS's? Does the bind() fail?

Regards,
Steve
Rita Sekhon
Occasional Advisor

Re: Re-using a port # when the previous connection is in TIME_WAIT

On other platforms the bind does not fail and the connect succeeds. The connection in TIME_WAIT disappears and the new connection is shown as ESTABLISHED.

I would think that the new connect request (SYN) should initiate a new connection since the sequence number is outside the range of sequence numbers for the previous connection that is in TIME_WAIT. A packet that has the same source and destination address but a sequence # less than or equal to the sequence number for the connection in TIME_WAIT can be regarded as a duplicate.

I'm not sure we have all the patches installed. But can look into it. Could you point us to the patches of interest?
Steven Gillard_2
Honored Contributor

Re: Re-using a port # when the previous connection is in TIME_WAIT

Start with the latest ARPA transport patch, which is PHNE_25423. BUT there are some rather nasty warnings that go with this patch and some of its predecessors so to be safe you should probably install an earlier version, PHNE_22397.

In particular, this patch contains the following fix:

"Symptom:
Applications that quickly reconnect to the same remote
port (e.g. remsh) can experience 2-second delays in
connection establishment.
Defect Description:
connect() takes 2+ seconds due to SYN retransmits
to a connection waiting to close in time_wait
Before a simple check was done to see if the new
starting sequence number were greater than the last
received sequence number of a connection in time wait.
This failed quite often when randomized sequence
numbers are used because often a valid new sequence
number would still test less than the previous
sequence number.
Resolution:
The fix is to save the starting sequence number of
a connection and test that the new sequence number
is not the same when connecting to a server in time
wait. All other inflight data can be rejected
because the client packet will be out of the exact
range of the servers sequence space, that is, its
ack will not match the server's sequence range."

This sounds like it may help.

There are a couple of dependencies listed with this patch, which in turn have other dependences, so make sure you install all the necessary patches at once.

If you're not sure use the custom patch manager on the ITRC site.

Regards,
Steve
Rita Sekhon
Occasional Advisor

Re: Re-using a port # when the previous connection is in TIME_WAIT

I think you have found the exact fix we need. This would be the correct solution to the problem. We will install the patch (PHNE_22397) and verify the fix and let you know how it goes. Thanks!!
Rita Sekhon
Occasional Advisor

Re: Re-using a port # when the previous connection is in TIME_WAIT

Thanks Steven!!
The patch # PHNE_22397 fixes the problem.