Operating System - Linux
1819774 Members
3391 Online
109606 Solutions
New Discussion юеВ

Re: bnx2 ip checksum error

 
Jure Pecar_1
Advisor

bnx2 ip checksum error

Hello,

I'm noticing a random amount of outgoing packets with ip checksum error, generated by eth1 in DL385 G2. Bnx2 driver version is either 1.4.43-rh or 1.4.52d, RHEL4 release 6. This happens regardless of the tcp checksum offload setting (ethtool -K) and I also see bad packets on the other side, which means this problem is real and not just artefact of hw checksum offloading.

Has anyone seen anything like that?

I have a batch of fresh DL385s here on the table on which I can simply repeat the problem. Hook two together via eth1, enable chargen in xinetd on one and nc to port 19 from the other and redirect to /dev/null. Observe traffic with something like iptraf -d eth1. Ip checksum error counter goes up like crazy.
16 REPLIES 16
Jure Pecar_1
Advisor

Re: bnx2 ip checksum error

This turned out to be a tcp segmentation offload problem. Disabling it on the cards via ethtool -K eth1 tso off makes the problem go away.

Looks like Broadcom BCM5708 has issues with tcp segmentation, so it's better to do it in software.
rick jones
Honored Contributor

Re: bnx2 ip checksum error

It was my understanding that segmentation offload depended on checksum offload being enabled. If my understanding is correct, disabling checksum offload should have been enough to also disable segmentation offload. If my understanding is incorrect, hopefully this will trigger someone gently applying a clue-bat to my skull :)
there is no rest for the wicked yet the virtuous have no pillows
Jure Pecar_1
Advisor

Re: bnx2 ip checksum error

No, disabling checksum offload doesn't make a difference. I'm not familiar with tcp internals, but that's what expeirmental results show.
rick jones
Honored Contributor

Re: bnx2 ip checksum error

strange indeed. i'll ask around internally. just to be pedanticly clear - is this the checksum in the TCP header, or the checksum in the IP header?
there is no rest for the wicked yet the virtuous have no pillows
Jure Pecar_1
Advisor

Re: bnx2 ip checksum error

Good question. If UDP flows smoothly, i'd say it's TCP header.

Whatever that ethool -K affects :)
michael chan_4
Advisor

Re: bnx2 ip checksum error

http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=fde82055c1d0e64ff660d83c705db0e1abc9d12e

You may be seeing the problem fixed by the above patch. This patch is in bnx2 version 1.5.11 or later. The latest driver from Broadcom's website should also have the patch.
Jure Pecar_1
Advisor

Re: bnx2 ip checksum error

I just tested with 1.6.7b driver and the situation is actually worse. Now iptraf shows IP checksum error regardless of tso being on or off.

I think I'll contact mchan and davem directly to get this analyzed and hopefully fixed.
michael chan_4
Advisor

Re: bnx2 ip checksum error

Exactly what are you sending using nc? IP checksum is not offloaded if you turn off TSO. Can you run ethereal instead on the receiver to capture the packet and see how exactly the checksum is corrupted?
Jure Pecar_1
Advisor

Re: bnx2 ip checksum error

Anything really. The problem was originaly detected when drbd link between two machines started to misbehave under io load. Later I found out I can easily reproduce the problem with simple chargen. See my first post in this thread.

I'm attaching 1mb of dump on sending and receiving side. You can see many "tcp checkusm incorrect" on sending and "tcp previous segment lost" on receiving side.

The only pattern that I find interesting is that whenever tcp checksum is wrong on sending side, it is like 0x2nnn. Hopefully you can get some more information from dumps.
Jure Pecar_1
Advisor

Re: bnx2 ip checksum error

Why I said above that situation with 1.6.7b is worse ...

I did some more experiments and discovered that if I have established connection to chargen and change tso setting with ethtool, it does not affect that particular connection.

(S=sender, R=receiver)

S: ethtool -K eth1 tso off
R: nc sender 19 > /dev/null
S: iptraf -d eth1 shows no bad packets
S: ethtool -K eth1 tso on
S: iptraf -d eth1 shows no bad packets
R: kill nc
R: nc sender 19 > /dev/null
S: iptraf -d eth1 shows avalanche of bad packets

It's the same if I start with tso on and disable it later, while chargen connection is still up. That's why I prematurely concluded above that 1.6.7b is worse, when actually the situation is still the same.
michael chan_4
Advisor

Re: bnx2 ip checksum error

You will always see TCP checksum incorrect on the sender when you do TCP checksum offload or TSO. That's because the TCP checksum hasn't been calculated yet when ethereal captures these sending packets. If you turn on TSO, you'll even see packets bigger than 1514 in addition to wrong TCP checksum because segmentation hasn't been done yet. These are all normal. The TCP checksum and segmentation will be done by the NIC hardware as packets are sent on the wire.

If you see bad TCP or IP checksum on the receiver, then it's different and it's real. Do you see any on the receiver? I did a search for bad TCP or bad IP checksum and did not find any on receiving.tcp.

The "TCP previous segment lost" on receiving.tcp is a different thing and I see that a lot. My guess is that ethereal for some reason cannot keep up and sometimes drops packets during capture. When analyzing the trace, it will falsely detect some missing packets. To see if you really have any TCP loss, you should use netstat -s and look at the TCP counters.
michael chan_4
Advisor

Re: bnx2 ip checksum error

When doing nc from one to the other, do the bad IP checksums show up in netstat -s?

I could not duplicate the problem doing the exact same steps you outlined.
Jure Pecar_1
Advisor

Re: bnx2 ip checksum error

Sorry for the delay ... had some busy days :(

You might be right that I got fooled by the hardware checksuming on the testing systems here, but the dumps I'm attaching this time are from our production system drbd link, which I managed to upgrade to 1.6.7b. These captures were done with tso off and bad packets are seen on receiving side too.

I hope you can get some useful info from them, because I would really like to figure this out.
michael chan_4
Advisor

Re: bnx2 ip checksum error

Your receiver has TCP checksum offload enabled and ethereal will always see TCP bad checksum on every TCP packet that has its checksum calculated by the transmitting NIC. When you run ethereal on a box, you are not capturing the packets transmitted by the box on the wire so you need to look at these traces carefully. Once again, I did not see any bad IP checksums when I did a search on the receiver's trace. So this does match up with IP checksum errors that iptraf was reporting.
Jure Pecar_1
Advisor

Re: bnx2 ip checksum error

Ok then I'm missing something here. I'm sorry for wasting your time.

Can you point me to some docs where I can study this to understand better the details of what I'm seeing?
michael chan_4
Advisor

Re: bnx2 ip checksum error

You mentioned that you saw IP checksum error with iptraf so we need to confirm that or conclude that iptraf was giving you wrong data.

ethereal traces that you collected did not show any IP checksum error packets. You can also look up /proc/net/snmp and look for the InHdrErrors counter.