Poor throughput using Etherchannel and 802.3ad bond

JohnMurrayUK · ‎11-10-2011

We are seeing appalling throughput with our SANiQ 9.5 (with all patches) P4500 4-node cluster. Each node is configured to use 802.3ad bonds and is connected to a pair of Cisco 3750G switches configured as below (which is fairly regular AFAIK). HP are being excruciatingly slow in responding to our support ticket, although they now have some perf counter info.

I have a 600GB NRAID10 LUN presented to a vSphere4.1 U1 host which will only write at approx 4MB/s using RR. When taking a snapshot of a 24GB RAM VM, the process takes just over an hour!

EDIT: This appears to be a VMware vSphere 4.1 issue regarding the new LazySave background snapshot processing. See: http://communities.vmware.com/message/1866667#1866667

To diagnose I have created a new test 100GB NRAID5 LUN presented to a physical DL360 G7 using MS iSCSI initiator with two NICs, but no DSM as this server also connects to VMFS LUNS to provide backup services overnight. The first data written ran at 92MBps, 350 IOPS, and a latency of 40ms, but after 3 mins this tailed off to 44MBps, 158 IOPS, and a latency of 400ms. Any subsequent writes all fail to run higher than 40MBps. The backup server and the vSphere iSCSI switch ports are not configured for Etherchannel.

Any suggestions are welcome.

P4500 port config:

interface Port-channel1
description P4500-01 iSCSI Etherchannel
switchport access vlan 1000
switchport mode access
flowcontrol receive desired
!
interface GigabitEthernet1/0/5
description P4500-01
switchport access vlan 1000
switchport mode access
flowcontrol receive desired
channel-group 1 mode active
spanning-tree portfast
!
interface GigabitEthernet2/0/5
description P4500-01
switchport access vlan 1000
switchport mode access
flowcontrol receive desired
channel-group 1 mode active
spanning-tree portfast

!

Backup Server Port Config:

!
interface GigabitEthernet1/0/9
description BACKUPSRV
switchport access vlan 1000
switchport mode access
flowcontrol receive desired
spanning-tree portfast

!

interface GigabitEthernet2/0/9
description BACKUPSRV
switchport access vlan 1000
switchport mode access
flowcontrol receive desired
spanning-tree portfast
!

vSphere Host Port Config:

!
interface GigabitEthernet1/0/1
description ESXi1
switchport access vlan 1000
switchport mode access
flowcontrol receive desired
spanning-tree portfast
!

interface GigabitEthernet2/0/1
description ESXi1
switchport access vlan 1000
switchport mode access
flowcontrol receive desired
spanning-tree portfast
!

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star"

Davy Priem · ‎11-11-2011

To exclude problems with the bonds, try using one link on each node.

WayneH · ‎11-11-2011

Recently we upgraded to 9.5 (inc. all patches) and installed 2 additional nodes in our P4500 cluster taking the number to 4. Since completing this upgrade, we have noticed a marked drop in performance. I don't have any solid figures yet, but our virtual machines are very sluggish. I will get some more details so you can compare.

JohnMurrayUK · ‎11-12-2011

I'd be very interested to hear more about your throughput on 9.5 Wayne. Our NSM's are running 9.5.00.1215.0 with the following patches: 10054-05, 10106-00, and 10111-00.

We have shutdown a switch to make sure we are running on just one NIC, and have iperf running between two of the nodes which shows just 235Mbits/sec average speed from one shelf to the next on our 1Gbps interface.

I'm not sure why the througput is so bad, but HP are now refusing to support me as they claim this must be an issue with the network.

The way I see it, I have a simple and standard Etherchannel LACP config, a SAN which supports this (indeed recommends this in the vSphere 5 Best Practice doc), but it doesn't perform. If I had to guess whether Cisco had a bug in their IOS for the 3750G or whether HP had a bug in SANiQ, I'm gonna suspect HP every time.

The HP engineer informed me yesterday very clearly that there weren't any modifications 'at all' to the network stack on the P4500 as a result of upgrading from SANiQ 9.0 to 9.5 (and I don't have nic driver versions to hand for SANiQ 9.0 to check when the 3.4.0 bond driver was introduced, and when the 2.1.0-k2 nic driver was introduced - anyone?).

It would appear that although HP make the best practice recommendation to use LACP, the HP support team does not recommend it, and suggested I use ALB instead. I'm obviously reluctant to do so as I will only get transmit load balancing and be restricted to 1Gbps for the receive data path (even though the support guy was adamant I would get 2Gbps in each direction with ALB). Can anyone confirm I am correct here?

Here are some iperf stats for other P4500G2 systems for comparison:

P4500G2 5.4TB NSM's

SANiQ 8.5.00.0319.0 (no patches)

Intel 82576 NIC FW=1.7-2, Driver=igb Version=1.2.45-k2

Bond driver version 3.3.0

ALB using £1000 Linksys1Gbps switches

iperf reports 911Mbits/sec

P4500G2 5.4TB NSM's

SANiQ 8.5.00.0319.0 (patch 10070-00, 10078-00)

Intel 82576 NIC FW=1.7-2, Driver=igb Version=2.1.9

Bond driver version 3.3.0

ALB using £4000 Nortel 1Gbps switches

iperf reports 728Mbits/sec

P4500G2 5.4TB NSM's

SANiQ 9.5.00.1215.0 (patch 10054-05, 10106-00, 10111-00)

Intel 82576 NIC FW=1.7-2, Driver=igb Version=2.1.0-k2

Bond driver version 3.4.0

802.3ad LACP using £6000 Cisco 3750G 1Gbps switches

iperf reports 235Mbits/sec

I wish I had the same switches in each cluster, but I don't. Not massively scientific, but I would have hoped to see the more recent SANiQ getting faster and faster on better and better switches, but I don't.

Anyone got anymore stats for comparison?

Here's how I gathered my stats: Putty on using ssh to port 16022 to two nodes, login with Management Group credentials, decide which will be the server for the test and which the client and run these commands:

On the server:

CLIQ>utility run="iperf -s login=[IP of server NSM] -P 0 -i 5"

(say yes to confirmation to run the command when prompted)

On the client:

CLIQ>utility run="iperf -c [IP of NSM above] -t 60 -i 5 -m 164"

(say yes to confirmation to run the command when prompted)

After 60 seconds the client will report the average throughput as the last line.

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star"

Jay Cardin · ‎11-14-2011

I was planning to switch my ALB bonds (1 GB from each NSM to 2 separate switches) to LACP using the DT-LACP feature in the HP E3500YL48G switches.

P4500G2 6.5TB NSM

SANiQ 9.5.00.1215.0 (patch 10054-05, 10106-00)

Intel 82576 NIC FW=1.7-2, Driver=igb Version=2.1.0-k2

Bond driver version 3.4.0

ALB Bond two 2 $3000 HP E3500YL48G switches

iperf reports 763Mbits/sec

When I installed the E3500's, performance was horrible (worse than what you were seeing). After a month of troubleshooting, an HP tech pointed out that the interswitch links between the two E3500's had flow-control enabled. They said that would cause high latency and low throughput. I disabled flow-control (only on the ISL) and the speed came up to what I posted above.

I'm surprised the LH tech that told you you can get 2Gbps in both directions. I thought it was pretty well documented that it couldn't.

Great thread!

5y53ng · ‎11-14-2011

Hi John,

I have worked with the same Cisco 3750G switches in a stacked configuration and during load testing I have found the switches will begin dropping a lot of packets. I'm not sure if these particular switches meet the cache recommendations specified by HP (512K per port I believe). I notice that a virtual machine will exhibit much higher disk throughput if it happens to be running on the same host as the VSA acting its gateway connection to shared storage. In this scenario the traffic never leaves the vSwitch.

Generate some disk traffic and watch the switch interfaces (the individual interfaces, not the port-channel) and you'll see output drops. After, try the same test with the virtual machine running on the same host as the gateway connection and you'll see a nice boost in throughput.

ChristmasGT · ‎11-19-2011

Hey Guys, thought I might chime in on this as well.

Currently my company is using 5 P4500 G2 Nodes with 2x1Gb nics in 802.3ad bonding on version 9.5. Jumbo frames are not enabled, however, flow control is. Unfortunately my switch only supports Asymmetric flow control.

For switching we're using a single isolated Brocade FWS48G, with STP disabled.

There are 3 servers connecting to the SAN, each have dedicated 4x1Gb nics in 802.3ad bonding with the HP DSM Installed. I've been running up a wall for the last few weeks trying to determine the poor performance issues we're having. Tried ALB, no luck, tried using single NIC's to connect to the SAN with no luck as well there. Had pretty much any combination of things to try and diagnose this issue.

Tonight I came accross this post and tried Johns method of using iperf to test out the speed between the SANs themselves. Running iPerf on the SANs seems to give really odd results.

For instance, when running the test between 2 nodes (1 acting as a server, other as a client with the exact iPerf commands John reported) I will get 940Mb consistantly on every test, which is pretty good, but lower than the 802.3ad bonding should be since all NIC's on all nodes are Active/Active.

Here's where it get's weird however. When running the test the next 4 times, the performance plummets drastically, down to as low as 80Mb on each test. Then when run again immediately after, will run right back up to 940Mb on the dot, then back and forth, so there's a lot of fluctuation going on.

Anyone have any idea of what's causing that? My only guess would be the Switch Port buffer backing up, but it's becoming really frustrating. I've uploaded a text document that contains the switch's Run Config as a refrence. The HP SAN is occupying ports 39 to 48.

Anyone see any reason for such low performance? The Server side shows the same issues with low througput as well.

JohnMurrayUK · ‎11-20-2011

Well first of all, your iperf stat of 940Mb/s is very good. Don't forget with LACP you don't actually get 2Gbit for each session, the switch will direct a flow to one of the ports but not both. Hence each sheld can receive at 2Gbps but only if it talks to multiple clients, or a client with multiple NICs.

When the iperf runs, it uses one NIC only for the same reason.

Good post on this here:

http://blog.open-e.com/bonding-versus-mpio-explained/

My guess for the drop in performance you are seeing is that you are using Network-Raid5, yes?

Hence there is a background process which takes snapshots and creates partity every time the changed data delta reaches 10% of the volume size. This has a large overhead and can really skew the iperf results. Unfortuantely I'm not aware of a way to expose this, or change the schedule.

Let us know if you are using Network Raid 5 at all on your cluster.

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star"

ChristmasGT · ‎11-20-2011

Hey John,

Indeed we are using Network Raid 5 throughout. Also, the nodes are each on a Hardware Raid 5 as well. I'd like to use Network Raid 10 but can't afford to use the space that it requires.

JohnMurrayUK · ‎11-22-2011

So my Cisco switches were set to receive flow control only, and my NSM's were set to Auto send and receive.

I had to break the bond, disable receive flow control on each adater, then re-create the bond again.

Running happily on LACP Etherchannel now with good throughput between shelves via iperf (except when a restripe is occuring).

Hence, when writing a large 20GB file to a Network-Raid5 volume I average just 20MB/s throughput with 750ms latencies on write, and have an average Queue Depth Total of 68.

The net result is that my host is not able to push data though at the maximum rate and averages 14% network utilisation over it's iSCSI interface. Maybe the gateway NSM is saturated with the process of distributing the restripe to the other NSMs.

Does anyone else have some Network-Raid5 throughput stats to share?

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star"

ChristmasGT · ‎11-22-2011

Hey, not sure if this is helpful or not but I'm seeing something similar.

With my windows 2008 box being the server and my SAN being the client I can achive 890Mbs which isn't too bad.

However, when we turn the tables and make the SAN the server and the client my Windows 2008 box I can only achive 390Mbs.

JohnMurrayUK · ‎11-22-2011

Woah that confused me. Do you mean running iperf as a client/server?

Is your SAN busy responding to lots of other 'clients' or hosting some Network Raid 5 volumes?

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star"

ChristmasGT · ‎11-29-2011

Sorry for the delayed response. Basically it was the following:

Test 1)
Windows 2008 box as iperf Server - P4000 Lefthand Node as Client: Maximum 380Mb/s Throughput.

Test 2)

P4000 Lefthand Node as Server - Windows 2008 box as Client: 940Mb/s Throughput.

For some reason when you flip the tables and make a connecting box the server and the SAN the client the performance degrades drastically, have you tried a similar test? Also, there's no IO at all going on with the SAN.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Poor throughput using Etherchannel and 802.3ad bond

Poor throughput using Etherchannel and 802.3ad bond

Re: Poor throughput using Etherchannel and 802.3ad bond

Re: Poor throughput using Etherchannel and 802.3ad bond

Re: Poor throughput using Etherchannel and 802.3ad bond

Re: Poor throughput using Etherchannel and 802.3ad bond

Re: Poor throughput using Etherchannel and 802.3ad bond

Re: Poor throughput using Etherchannel and 802.3ad bond

Re: Poor throughput using Etherchannel and 802.3ad bond

Re: Poor throughput using Etherchannel and 802.3ad bond

Re: Poor throughput using Etherchannel and 802.3ad bond

Re: Poor throughput using Etherchannel and 802.3ad bond

Re: Poor throughput using Etherchannel and 802.3ad bond

Re: Poor throughput using Etherchannel and 802.3ad bond