Operating System - HP-UX
1833883 Members
1614 Online
110063 Solutions
New Discussion

Re: TCP packets been retansmitted

 
LucianoCarvalho
Respected Contributor

TCP packets been retansmitted

Hig guys!!!

I'm having problems with too many tcp packets been retransmitted. I started a nettl trace with the followin commands:
#nettl -tn pduin pduout -e ns_ls_ip -tm 99999 -f /tmp/ip_trace
#nettl -tn pduin pduout -e ns_ls_tcp -tm 99999 -f /tmp/tcp_trace
Now I'm using Ethereal and/or netfmt to format the file, but I'm having problems to identify what packets are been retransmitted. Does anyone have any information to help me to find that packets????
thanks
5 REPLIES 5
harry d brown jr
Honored Contributor

Re: TCP packets been retansmitted

Luciano,

If you are having too many packets transmitted you need to look into your cabling, network cards, routers/switches/bridges, duplex speeds (auto-negotiation) ... basically all kinds of "stuff".

What did you do to determine you were having a lot of retransmitted packets?
Live Free or Die
LucianoCarvalho
Respected Contributor

Re: TCP packets been retansmitted

I used netstat -s and the output was:
tcp:
324476259 packets sent
276213316 data packets (1504096580 bytes)
841851 data packets (775487720 bytes) retransmitted
Ron Kinner
Honored Contributor

Re: TCP packets been retansmitted

If you are looking at raw packet data then you can look for identical sequence numbers (bits 32-63) or for sudden drops in the window size (bits 112-127) but it seems to me you are using the wrong tool.

Retransmissions are caused by packets getting lost along the way. All the retransmitted packet is going to tell you is when it happened, who you were talking to at the time and how big the packet was and there are more other ways of getting this info. Packets get lost because they get corrupted (errors) or because a queue gets too long (congestion). This can happen anywhere along the way.

netstat -s |grep data
will show you the total number of packets and the number of retransmissions in the outgoing direction.

netstat -f inet
will show you who you are talking too and the number of packets in the recv and send queues. Connections with non zero send-Q are the ones you want to look more closely at. Non zero recv-Q's indicate a local problem. Perhaps the CPU is too busy or the application is having problems or is running short on memory.

Ping each of them (or at least one host on each different subnet) with a long series of 1480 byte pings (or whatever your pmtu is on the network in netstat -rn minus 20) and see which have the worst % lost pings.

ping "hostnameORipaddress" 1480 -n 100

This will take a minute or two to complete so you might want to leave off the -n 100 to make sure it works at all before using the -n 100. Ctrl + C twice to stop.

If you get source quenches then turn it off with NDD on the target machines.

Then run traceroute to each of the bad destinations and see how the packets get there.

Then ping each stop along the way (starting with the closest) and see where the loss starts to appear. Once you isolate where the problem is then you can call in the trouble or fix it yourself if it belongs to you.

If it appears right away then you should first check that your local Ethernet connection is good with
lanadmin
lan
display
(chech each ppa if you have more than one NIC)
and see what sort of errors and collisions percentages (errors / total number of packets sent or received) you have on the local loop. Don't forget to look at the second page of the display.


If this is bad it may be a bad cable, a duplex mismatch, bad NIC or bad hub/switch port or on a hub too much garbage caused by a babbling NIC.

If you think the problem might be time related then you can set up a script to run a large test ping every five or ten minutes and record the time.

date >>junk
ping "targetA" 1480 -n 10 >>junk
ping "targetB" 1480 -n 10 >> junk

Once you get the time when the problem is worst you can use the above techniques to isolate it to a single link.

Time dependance is usually a congestion problem (tho I have heard of problems where heavy elevator use at lunch time was to blame - electrical interference from a bad motor in the elevator). Custom queuing (or filtering of unnecessary traffic) on the router can reduce the impact (or you can just add bandwidth.) You can also monitor the traffic with a program like MRTG which gives you real pretty graphs of traffic versus time and can be configured to get errors versus time or CPU usage versus time.

Ron

Joseph T. Wyckoff
Honored Contributor

Re: TCP packets been retansmitted

Looking at the numbers it appears that

8/3244 percent of your packets are retransmitted - 0.2 percent - much less than one percent.

That seems a pretty small percentage to me, and seems an unlikely issue to me, especially if you are not experienceing outright connection failures.

What problem are you trying to solve?
Omniback and NT problems? double check name resolution, DNS/HOSTS...
Joseph T. Wyckoff
Honored Contributor

Re: TCP packets been retansmitted

Looking a bit further at your netstat output

324,000,000 packets sent

276,000,000 data packets (1,504 mbytes)

841,851 data packets ( 775 mbytes) retransmitted

Many packets are sent, or appear to be sent, successfully (as I noted above) about 8 in 2760 is retransmitted.

I looked at the average size of the data packet - the retransmitted packes average very large (921 bytes/packet)
= 775mega bytes / 841kilo packets

Comparing that with the average of all packets
5 bytes / packet
= 1.5giga bytes / 276mega packets

Something doesn't quite add up here... I don't think a 5 byte / packet average is realistic, but maybe this is counting data bytes rather than data+overhead bytes.

Anyway - it appears the successful packets are small, and the lossy packets are large.

Take my observations with some scepticism, of course...
Omniback and NT problems? double check name resolution, DNS/HOSTS...