1846865 Members
4968 Online
110256 Solutions
New Discussion

Re: Networking Problem..

 
Smirjit Singh
Advisor

Networking Problem..

Dear Gurus,

We are facing network connection problem. When client prg tries to connect in Server end then they are getting Connection Timed Out error or Error in open port errors.

We checked server end & saw lots of TIME_WAIT is there. Then I set ndd value 10 secs. Now it is closing old port.But still is load is very high & every secs TIME_WAIT is increasing.

I asked developer & then they told they are closing connection. They also did not get any solution.

I thought to kill those port by ndd -set /dev/tcp tcp_discon . So I collected by ndd -get /dev/tcp tcp_status. But when I am trying to kill then it is showing error. Below is output.

000000004c4c0400 199.041.248.131 06d52707 06d52707 00000000 00000000 46fbe03e 46fbe03e 00000000 -N/A- -N/A- [7407,b01] TIME_WAIT*(1410)

Please advise me what should I check ? Do I need to check any kernel value ?

Pls advise.
Knowledge is only most valuable things which can't buy.
5 REPLIES 5
Bill McNAMARA_1
Honored Contributor

Re: Networking Problem..

netstat -an | more

Check for FIN_WAIT2 and apply patches

The ndd command you are using is possibly not supported.

I think thats mentioned in the man.

Later,
Bill
It works for me (tm)
harry d brown jr
Honored Contributor

Re: Networking Problem..

Sandip,

What OS are you running and is it up to date with patches?

Also, read this thread, Where Rick Jones replies:

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x78a9e7613948d5118fef0090279cd0f9,00.html

live free or die
harry

Live Free or Die
Ron Kinner
Honored Contributor

Re: Networking Problem..

Start with the basics.

lanadmin
lan
display
(change to the correct card with ppa x if you have more than one card)

look at your errors and whether you are set up at half or full duplex. If it says 100 half then you may want to set it to 100 full on the HP AND on the switch. Check the same thing on the client.

run a ping between the client and server and let it run for a while. Do you see any errors? Try increasing the ping size over 1500 bytes, do you see errors now? We had problems with our NICs being sensitive to electromagnetic interference which caused this symptom.

look at netstat -s and see if you see any errors building up between the layers.

Finally run netstat -a on the client machine too. What do they show?

The TIME_WAIT state is supposed to be set to 2 segment lifetimes (maximum time a packet can live) and is reached after the connection has been closed. It is just a cautionary time where they wait to see if there is any more last minute duplicate data packets are received. It's actually a normal thing to see but should drop on its own.

The problem you usually see is a bunch of ports stuck in FIN_WAIT_2. This is a problem with the other end. Win NT has a bug in the TCP/IP code which sometimes leaves the WIN NT in LAST ACK and the HP in FIN_WAIT_2. You have to ask for the patch from Microsoft.

If all else fails you will need to do some sniffing to see what is actually going wrong. tcpdump is a good program to use for this if you don't have a sniffer. www.tcpdump.org

You can also get nettl to show you some errors.


Ron


Steven Sim Kok Leong
Honored Contributor

Re: Networking Problem..

Hi,

Since there are so many connections in TIME_WAIT state, you should try to reduce the TIME_WAIT interval by setting tcp_time_wait_interval via ndd.

With regards to tcp_discon, both tcp_discon and tcp_discon_by_addr are not supported but they should function in most cases.

Try using tcp_discon_by_addr instead of tcp_discon. The following thread may help.

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0xbe06a22d6d27d5118fef0090279cd0f9,00.html

Hope this helps. Regards.

Steven Sim Kok Leong
rick jones
Honored Contributor

Re: Networking Problem..

i would STONGLY suggest that one NOT shrink tcp_time_wait_interval. TIME_WAIT is there as part of TCP's correctness algorithms. the TCP stack in HP-UX can handle hundreds of thousands of TIME_WAIT endpoints without difficulty.

similarly, I would also suggest NOT using any of the "should never have left the lab" disconnect kludges.

if my intuition is correct, the clients are probably trying to establish connections at very high rates. this can result in trying to establish a connection using the same four-tuple of local/remote IP and local/remote port that is in TIME_WAIT. that can be problematic.

for example, if your clients have the not-untypical anonymous port range of 1024 to 5000, if a client tries to establish connections at a rate greater than (5000-1025)/len(TIME_WAIT) or 3975/60 or ~65 connections per second it will start to have problems.

The best thing to do is to fix the client code so it does not have to establish connections so often.

Second best thing to do is to fix the client code to explicitly select port numbers from the range of 5000 to 65535 itself. This will allow the client to do as many as 1000 connections per second without worrying about TIME_WAIT reuse. This is what the SPECweb96 and SPECweb99 load generating codes do.

Third best thing to do is to alter the client stack's anonymous port number range. to something larger.

don't shrink TIME_WAIT, don't do the ndd disconnect thing, and don't do abortive closes in the app - all that last one will do is start to leave your server with FIN_WAIT_2 connections instead of TIME_WAITs
there is no rest for the wicked yet the virtuous have no pillows