Operating System - Linux
1752513 Members
5429 Online
108788 Solutions
New Discussion юеВ

Frame error with driver be2net & Emulex OneConnect 10Gb

 
humeau
New Member

Frame error with driver be2net & Emulex OneConnect 10Gb

Hi,

We have a big issue with this NIC. It's having a lot of frame error (0,9%) and this issue is disruptin a pack of 12 servers that can't work reliably together.

The kernel running is a 2.6.32-5-amd64 and it calls the be2net driver for this NIC. The blades are BLC460C G7 (serial of one of them : CZJ0370C9Y)

Basically, when looking at various parameters of the card, we found that a ethtool -S eth0
is giving a wide range of errors :

rx_crc_errors: 1344063638
rx_alignment_symbol_errors: 305
rx_pause_frames: 1548632155
rx_control_frames: 1548556790
rx_in_range_errors: 136
rx_out_range_errors: 75229
rx_ip_checksum_errs: 101494225
rx_tcp_checksum_errs: 681739904

But the funny part is ahead...
When querying the RX/TX ring size we get ... moving values ... Yep, not fixed. Each time we poll, the value is not the same, which could be part of the problem and the TX ring buffer is 0...

# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 1024
RX Mini: 0
RX Jumbo: 0
TX: 2048
Current hardware settings:
RX: 981
RX Mini: 0
RX Jumbo: 0
TX: 0

the value is 981 on this one, but a few seconds later it will be 1022 or 956 or something else. The TX buffer is always 0.

When trying fix this value with ethtool, I get an error :

ethtool -G eth0 rx 1024 tx 1024

we get a "Cannot set device ring parameters: Operation not supported"

Any hints/tricks/solution on this one ?
Given the fact that HP is delivering servers "in the flow" to our company, the chipset aren't the same from one box to another but the 10 considered servers are the same series and all have the same problem.

here is what the ifconfig gives :
eth0 Link encap:Ethernet HWaddr d4:85:64:56:a1:88
inet addr:10.1.11.8 Bcast:10.1.11.255 Mask:255.255.255.0
inet6 addr: fe80::d685:64ff:fe56:a188/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3975969570 errors:2127514868 dropped:0 overruns:0 frame:1344280739
TX packets:1634477316 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1178618858994 (1.0 TiB) TX bytes:0 (0.0 B)
10 REPLIES 10
imeitoiu
Occasional Visitor

Re: Frame error with driver be2net & Emulex OneConnect 10Gb

Hey,

 

I have the same problem on a HP blc460c g7 with the same be2net module & emulex one connect 10g CNAs.

 

i've tried several linux distributions to no avail. anyone have an ideea? (tried: centos 5.6, centos 6, ubuntu 11.10 server)

 

 

JPansanel
Occasional Visitor

Re: Frame error with driver be2net & Emulex OneConnect 10Gb

In the Emulex documentation, there is the following information:

TCP Segmentation Offload (TSO) is enabled by default. In networks with very little loss, TSO improves
performance considerably and must remain enabled. The proc variable: /proc/sys/net/ipv4/
tcp_tso_win_divisor controls how aggressive the network stack can be in making TSO requests. TSO
divisor values in the range 2 to 16 are recommended for a low loss network. The default value of 3 in the
RHEL 6 and SLES 11 SP1 distributions seem to be the optimal one for a no loss network.
Smaller divisor values result in larger TSO chunks and better throughput as well as CPU utilization.
Note: Systems with a faster Processor Front Side Bus (FSB) clock speed perform better
than those with slower FSB clock speeds.
However, if the receiver or the network is dropping frames (too many retransmits on transmit side as
indicated by netstat -st), it may help to make TSO less aggressive by increasing the divisor value or
even turn off TSO. To set the divisor to 8, run:
echo 8 > /proc/sys/net/ipv4/tcp_tso_win_divisor


To turn TSO on or off, run the ethtool commands:
ethtool -K <ethX> tso off
ethtool -K <ethX> tso on
where ethX is the name of the Ethernet device you are working on.

 

I am currently trying to use this info for by-passing the errors with the dropped RX packets.

 

pepperbob
New Member

Re: Frame error with driver be2net & Emulex OneConnect 10Gb

Did anybody found a resolution to this issue?

We're seeing exactly the same thing and none of the approaches does help to solve or reduce the rx-errors.

Even updating the be2net module to version 2.102.435 (iirc It's kernel 2.6.32-5-amd64 ships with 2.102.105) does not show any affect.


daimoniac
New Member

Re: Frame error with driver be2net & Emulex OneConnect 10Gb

We too have this problem. Also, we cannot use jumbo frames, may be a relation exists?

 

Richard Stockdale
Frequent Advisor

Re: Frame error with driver be2net & Emulex OneConnect 10Gb

>>When querying the RX/TX ring size we get ... moving values ... Yep, not fixed.

>>Each time we poll, the value is not the same, which could be part of the problem

>>and the TX ring buffer is 0...
>>
>>Current hardware settings:
>>RX: 981

 

Current settings would be the number of receive and transmit buffers (ring descriptors or fragments, not necessarily packets) currently owned by the device.

 

The number of receive buffers owned by the device changes as buffers are used and then replenished (they get replenished when a watermark value is reached, not on every receive), so this is normal.

 

Also, the driver doesn't allow the maximum values (ring sizes) to change, so that is why you can't change with ethtool.


TX shows the number of transmits outstanding, which, unless you look at it during heavy activity is going to usually show up as zero.

 

Richard Stockdale
Frequent Advisor

Re: Frame error with driver be2net & Emulex OneConnect 10Gb

Could post output from "lspci -v"?  It would help to find out what device you have.  Based on the firmware version, 2.102... it would seem to be a be2 chip.

 

Also, I'm not sure, but it sounds like very old firmware - where do you look for up-to-date firmware?

fssdc
Visitor

Re: Frame error with driver be2net & Emulex OneConnect 10Gb

I also encounter this problem on two identical ProLiant BL460c G7 servers :

 

uname -r
2.6.32-5-amd64

lspci | grep Ethernet
02:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01)
02:00.1 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01)

 

I've tried to turn off TSO and set tcp_tso_win_divisor to 8 but it did not resolve the problem.

Richard Stockdale
Frequent Advisor

Re: Frame error with driver be2net & Emulex OneConnect 10Gb

There isn't much information to go on.

 

The first post described huge error counts and some confusion around ethtool data which was explained.  The huge error counts didn't make sense - seemed like there was a driver / firmware issue that perhaps accounted for the errors.  And there was a suggestion to turn off TSO just to see if that made a difference and it didn't.

 

I'd suggest ensuring you have the latest firmware for the device and the latest driver, for starters.

 

If the result is the same, I'd start suspecting hardware.  One post seemed like it was happening on 10 systems and that would probably rule out broken hardware, but you still have to verify the switch is not broken as well.

 

In the latest post, it happened on two BL460c G7 servers - can you give more information - was it high error counts?  Other odd counters?  Were high error counts accompanied by connection problems?  And can you check the firmware version on the card? (ethtool -i devname).

fssdc
Visitor

Re: Frame error with driver be2net & Emulex OneConnect 10Gb

I suspect the be2net driver to be the problem. I have 6 BL460 G7 :

 

 - 2 debian 6 kernel 2.6.32-5-amd64 be2net 2.101.205 -> huge RX errors count

 

driver: be2net
version: 2.101.205
firmware-version: 4.0.493.0
bus-info: 0000:02:00.0

driver: be2net
version: 2.101.205
firmware-version: 4.0.493.0
bus-info: 0000:02:00.0

 

 - 2 debian 6 kernel 3.2.32-1~bpo60+1 be2net 4.2.220u -> OK

 

driver: be2net
version: 4.2.220u
firmware-version: 4.0.493.0
bus-info: 0000:02:00.1

driver: be2net
version: 4.2.220u
firmware-version: 3.102.517.701
bus-info: 0000:02:00.1

 

 - 2 debian 6 kernel 3.2.35-2~bpo60+1 be2net 4.2.220u -> OK

 

driver: be2net
version: 4.2.220u
firmware-version: 4.0.493.0
bus-info: 0000:02:00.1

driver: be2net
version: 4.2.220u
firmware-version: 3.102.517.701
bus-info: 0000:02:00.1

 

I found drivers on emulex website (http://www.emulex.com/downloads/emulex/linux/debian/drivers.html), unfortunately I can't test them yet as my servers are already in production.