Operating System - Linux
1820622 Members
1903 Online
109626 Solutions
New Discussion юеВ

Re: Dropped packets with an NC550SFP card in a ProLiant DL360 G7

 
SOLVED
Go to solution
Mick Ryan
Advisor

Dropped packets with an NC550SFP card in a ProLiant DL360 G7

Hi

We've got a ProLiant DL360 G7 running SLES11 SP1 which has an NC550SFP card. The driver we're running on the card is be2net v2.103.269.28

The problem is that the card is dropping packets:
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0 1500 0 14463278 0 0 0 3120723 0 0 0 BMmRU
eth4 1500 0 23200257 0 133497 0 2276 0 0 0 BMRU
eth5 1500 0 22764816 0 125211 0 2239 0 0 0 BMRU

Normally, this is resolved by increasing the ring buffer size. However, with this card, the ethtool command isn't supported
# ethtool -G eth4 rx 2048
Cannot set device ring parameters: Operation not supported

So, the suppliers suggested passing the value as a parameter to the insmod command, like this:
# insmod ./be2net.ko rx_frag_size=8192

That appears to have been accepted:
# cat /sys/bus/pci/drivers/be2net/module/parameters/rx_frag_size
8192

However, the drops are still occurring.

As anyone else seen this behaviour or had any issues with this card running the be2net driver?

Cheers

Mike
7 REPLIES 7
rick jones
Honored Contributor
Solution

Re: Dropped packets with an NC550SFP card in a ProLiant DL360 G7

Broadly speaking, increasing something like the RX ring size only helps if the drops are the result of intermittant bursts of traffic beyond the ability of the NIC/host to keep-up. If the traffic rate is sustained at a level beyond the NIC/host's ability to keep-up increasing the rx ring count won't really help.

Also, is rx_frag_size actually increasing the number of buffers posted to the card, or their size? The "OneConnect Software Installation Guide has a Table 3 that describes rx_frag_size as "The size of fragments used to DMA received data" - if you have increased that to 8192 bytes from 2048 bytes, and "all" the traffic you are receiving is small, I doubt it will have really done much but consume more memory.

Of course, there are always further questions:

What is the nature of the traffic arriving at the NIC/host? TCP? UDP? Size? Rate?

What is the rate of drops over a given time interval?

What does the per-CPU utilization look like on your server? Are there any cores/hwthreads showing close to saturation?

How many irqs appear in /proc/interrupts when you grep for eth[45]?
there is no rest for the wicked yet the virtuous have no pillows
Mick Ryan
Advisor

Re: Dropped packets with an NC550SFP card in a ProLiant DL360 G7

Rick

You're spot-on. Overnight we had a reply from the supplier which echoed what you're saying. The rx_frag_size is only changing the size of the RX fragments, and not changing the Ring Buffer size. To quote:
"ethtool ├в g denotes the size of the RX/TX rings. i.e the number of entries in the ring. We do not allow that to be changed. It is a constant for us."

Just for a bit of background, the traffic is multicast and it's very bursty.

Cheers

Mike
rick jones
Honored Contributor

Re: Dropped packets with an NC550SFP card in a ProLiant DL360 G7

I cannot recall the last time I encountered a NIC where one could not alter the rx ring size, and that issue didn't come-up when I was doing some quick-and-dirty netperf testing on NC550s.

So, that suggests only a few avenues. First would be to try to speed-up reception somehow, perhaps by bringing multiple cores to the party. Not sure if multiple rx irq's would help with multicast traffic - depends on whether or not the NIC hashes multicast. Have you seen if multiple IRQs are happening already?

I don't recall what RPS (Receive Packet Steering) would do for IP multicast, but then I don't know that is in SLES11 SP1 anyway - I can never keep the linux kernel development timeline in my head :(

But the broad idea is to a) verify that there is indeed a core bottlenecking on inbound packet processing and b) spread that across more than one core.

The second idea is to ask for path length reduction in the be2net driver - if one cannot raise the bridge, lower the river. Of course if the "issue" is total path length...

The third idea is to go with a different NIC. As for which of the many HP-branded 10GbE NICs to try... I have to profess that I'm not sure. Before this issue I probably would have suggested the NC550 over the NC522 :) The NC523SFP replaces (?) the NC522SFP but I have no performance experience or information with that one. There is a "stand-up" version of the NC542m that may be a good one to try, alas the part number eludes me at the moment but it was based on a Mellanox chip - one that could, IIRC, "switch-hit" between IB and 10GbE.

And there are a number of "converged" 10GbE/FCoE NICs now.

I know, too many choices :)
there is no rest for the wicked yet the virtuous have no pillows
Mick Ryan
Advisor

Re: Dropped packets with an NC550SFP card in a ProLiant DL360 G7

Rick

Thanks again for the thoughtfulk response. We've decided to go with another tack and use a couple of blades with the onboard broadcom 10Gb nics, as I know we can change the ring buffer size on those.

Just need to find something useful to do with the NC550's now 8-)

Cheers

Mike
rick jones
Honored Contributor

Re: Dropped packets with an NC550SFP card in a ProLiant DL360 G7

Blades with on-board Broadcom 10GbE - BL460 G6's with NC532i LOMs? Those can indeed alter their ring sizes.

Out of curiousity, what is the sustained packet rate you expect?
there is no rest for the wicked yet the virtuous have no pillows
Mick Ryan
Advisor

Re: Dropped packets with an NC550SFP card in a ProLiant DL360 G7

Well, it's more micro-bursts that's the problem. They could be in the multiple gig range, but certainly wouldn't be too much for the card to handle - just the ring-buffer
humeau
New Member

Re: Dropped packets with an NC550SFP card in a ProLiant DL360 G7

Hi Mick,

we also have a strange and severe issue with a BE2NET driver. When looking at the server RX ring stat with ethtool, the value is just changing....... Yep, I also had some trouble to believe this.

Check yours :
watch --interval=1 ethtool -g eth0

The value seems to be moving each time we query them. Schr├Г┬╢eginger card ??? ethtool bug ?

if trying to set the value
ethtool -G eth0 rx 1024 tx 1024

we get a "Cannot set device ring parameters: Operation not supported"

I'm quite speechless on this one.
This said, the device is having a huge lot of frame error.