Re: package loss with increased packet size

Mukesh Jayawant · ‎04-25-2005

hi guys....
when i do a ping with big packet size there is a serious loss of packages..its affecting the system badly...does anyone know what could cause it?
# ping 140.171.229.21 1400
PING 140.171.229.21: 1400 byte packets
1400 bytes from 140.171.229.21: icmp_seq=0. time=1. ms
1400 bytes from 140.171.229.21: icmp_seq=1. time=0. ms
1400 bytes from 140.171.229.21: icmp_seq=2. time=0. ms
1400 bytes from 140.171.229.21: icmp_seq=3. time=0. ms
1400 bytes from 140.171.229.21: icmp_seq=4. time=0. ms
1400 bytes from 140.171.229.21: icmp_seq=5. time=0. ms
1400 bytes from 140.171.229.21: icmp_seq=6. time=0. ms
1400 bytes from 140.171.229.21: icmp_seq=7. time=0. ms
^C
----140.171.229.21 PING Statistics----
8 packets transmitted, 8 packets received, 0% packet loss
round-trip (ms) min/avg/max = 0/0/1
# ping 140.171.229.21 32768
PING 140.171.229.21: 32768 byte packets
32768 bytes from 140.171.229.21: icmp_seq=15. time=4. ms
^C
----140.171.229.21 PING Statistics----
17 packets transmitted, 1 packets received, 94% packet loss
round-trip (ms) min/avg/max = 4/4/4
#

32768 is the size used by nfs...

thanks and regards,
mukesh

Mohanasundaram_1 · ‎04-25-2005

Hi Mukesh,

I have seen a similar problem when the switch port was isolated as the problem.

But it can be the switch port, cable or the LAN adapter.

Check the /var/adm/nettl.LOG* to confirm if there were any recent logs. If so, format it using netfmt and view it. You may get a clue.

Its also worth checking your duplex setting on the switch as well as the Adapter.

Provide more info on the type of card, speed and duplex setting.
With regards,
Mohan.

Attitude, Not aptitude, determines your altitude

Alex Lavrov. · ‎04-25-2005

Well, shooting in the dark:

Make sure that the NIC configuration matches the configuration on the switch.
Check speed (10/100/1000), duplex (full/half) and autonegotiation. When they don't match, there are serious problems in connectivity.

I don't give a damn for a man that can only spell a word one way. (M. Twain)

harry d brown jr · ‎04-25-2005

What os release are you running?
How up-to-date are your patches?

Are you using a switch or a router between your NFS server and the NFS clients?

What is the lan card speed between the NFS server and the NFS clients?

Use netperf to do analysis: http://hpux.connect.org.uk/hppd/hpux/Networking/Admin/netperf-1.7.1/

live free or die
harry d brown jr

Live Free or Die

Mukesh Jayawant · ‎04-25-2005

Hi guys,

i think I could dislocate the switch port problem, coz i tried switch ports that were supposed to be running fine, except when i connected this system I get problems.
/var/adm/ttllog dint help much, it only showed that i had disconnected the port for sometime..that was during the swapping from one switch port to another switch port...
# netfmt -Nnf /var/adm/net*

----------------------Gigabit Ethernet LAN/9000 Networking------------------@#%
Timestamp : Thu Apr 14 METDST 2005 14:04:46.865982
Process ID : [ICS] Subsystem : IGELAN
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 0 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2004> 1000Base-T in path 0/1/1/0/4/0
Detected a faulty or disconnected cable.

----------------------Gigabit Ethernet LAN/9000 Networking------------------@#%
Timestamp : Mon Apr 25 METDST 2005 12:13:09.080111
Process ID : [ICS] Subsystem : IGELAN
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 0 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2004> 1000Base-T in path 0/1/1/0/4/0
Detected a faulty or disconnected cable.

----------------------Gigabit Ethernet LAN/9000 Networking------------------@#%
Timestamp : Mon Apr 25 METDST 2005 14:58:32.790112
Process ID : [ICS] Subsystem : IGELAN
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 0 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2004> 1000Base-T in path 0/1/1/0/4/0
Detected a faulty or disconnected cable.
----
the output for nic;
alfcf26:lanadmin -x 0
Speed = 1000 Full-Duplex.
Autonegotiation = On.
and that is what it shows on the switch side as well..
----
iam running the september 2004 release of the OS;
# uname -vr
B.11.23 U
- iam using a switch betn NFS Server and Client and both are set of 1000 MB

I will check out netperf in the meanwhile..if you have any other clues please tell me...
thanks so far,

regards,
mukesh

harry d brown jr · ‎04-25-2005

Is the network between the NFS server and the NFS client ONLY used for NFS traffic? Which would mean that you would have a separate lan for other traffic?

What is the MTU size ?? Post the output of 'netstat -rvn'

live free or die
harry d brown jr

Live Free or Die

Mukesh Jayawant · ‎04-25-2005

Is the network between the NFS server and the NFS client ONLY used for NFS traffic? Which would mean that you would have a separate lan for other traffic?

What is the MTU size ?? Post the output of 'netstat -rvn'

no no...i dont have a dedicated network for nfs server and client..my mistake..if my post indicated that way...its over the main lan..lan0, the output of netstat -rvn;

# netstat -rvn
Routing tables
Dest/Netmask Gateway Flags Refs Interface Pmtu
127.0.0.1/255.255.255.255 127.0.0.1 UH 0 lo0 4136
192.1.1.90/255.255.255.255 192.1.1.90 UH 0 lan1 4136
140.171.134.90/255.255.255.255 140.171.134.90 UH 0 lan0 4136
192.1.1.0/255.255.255.0 192.1.1.90 U 2 lan1 1500
140.171.134.0/255.255.255.0 140.171.134.90 U 2 lan0 1500
127.0.0.0/255.0.0.0 127.0.0.1 U 0 lo0 0
default/0.0.0.0 140.171.134.91 UG 0 lan0 0

however the most funny thing is....with in the same subnet i do not have any packet loss...packet loss happens only when i ping outside the subnet of the server...

hope this provides a clue..

thanks and regards,
mukesh

rick jones · ‎04-26-2005

32768 may be the NFS message size, but unless you are using UDP mounts, the actual TCP segment sizes on the network will be 1460 bytes of data (assuming a typical 1500 byte MTU network)

When you increase the ping size to 32678, you are forcing IP to generate 32768 / 1500 or 22 IP datagram fragments. Since there is no retransmission of IP datagrams by IP, and since _all_ fragments of an IP datagram must arrive to reassemble the datagram, the loss of just one of those 22 datagram fragments will cause the request to be toast. Similarly, on the way back, the loss of just one of those 22 will cause the response to be toast. So, only one packet loss out of 44 will cause the ping to fail.

If your NFS mounts are UDP, similar issues exist. UDP just hands what it has to IP and lets IP fragment it.

TCP on the other hand will do its own segmentation and so the loss of a single TCP segment will not make the other 21 segments useless - TCP will just retransmit the lost segment and all will be well - although perhaps a triffle slower.

So, I suspect that somewhere between your system and the system you are pinging, there is packet loss at some rate (duh :). It is probably fairly low, which explains why as you increase the ping message size you see increasing rates of loss.

The suggestions to check for duplex mismatches are good - when there is other traffic on the network at the same time as the pings, duplex mismatch can cause the pings to have losses - however, if the networks were completely idle you would not see that with a duplex mismatch because a ping is synchronous - there is no chance for both ends to try to transmit at the same time.

WRT to duplex:

How Autoneg is supposed to work:

When both sides of the link are set to autoneg, they will "negotiate"
the duplex setting and select full duplex if both sides can do
full-duplex.

If one side is hardcoded and not using autoneg, the autoneg process
will "fail" and the side trying to autoneg is required by spec to use
half-duplex mode.

If one side is using half-duplex, and the other is using full-duplex,
sorrow and woe is the usual result.

So, the following table shows what will happen given various settings
on each side:

Auto Half Full

Auto Happiness Lucky Sorrow

Half Lucky Happiness Sorrow

Full Sorrow Sorrow Happiness

Happiness means that there is a good shot of everything going well.
Lucky means that things will likely go well, but not because you did
anything correctly :) Sorrow means that there _will_ be a duplex
mis-match.

When there is a duplex mismatch, on the side running half-duplex you
will see various errors and probably a number of late collisions. On
the side running full-duplex you will see things like FCS errors.
Note that those errors are not necessarily conclusive, they are simply
indicators.

If yours is a GbE link, do not even begin to think about hardcoding the duplex setting - at least not if you are going to operate it at gigabit speeds. Leave it at autoneg.

there is no rest for the wicked yet the virtuous have no pillows

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: package loss with increased packet size

package loss with increased packet size