BladeSystem - General
1752796 Members
5623 Online
108789 Solutions
New Discussion

Network performance and bonding questions:

 
chuckk281
Trusted Contributor

Network performance and bonding questions:

Ray had some VC Flex-10 performance questions:

 

****************

 

  I've outlined the issue we're having below. Any assistance would be greatly appreciated.  I am trying to improve the performance of my network throughput that I am sending through Flex-10s that are in active/active mode. There are 2 uplinks on each Flex-10 on the enclosure. The 2 uplinks from flexA are plugged in  to switch 1 and and the 2 uplinks from flexB are in switch 2. All of the uplinks are configured into one port channel on the stacked switches and configured for LACP/etherchannel. The 2 uplinks from each flex are configured in a shared uplink set (SUS) and the links are active/active. An interface from each SUS has been made available to the Linux server and the NICs have been bonded for transmit performance. However, we are not seeing any improved performance - in fact for every bonding mode we see exactly the same performance. The server was rebooted after each bonding mode switch.

 

There is also switch output for each interface that shows data only being transferred down one interface. So if the NICs are bonded at the host correctly and the uplinks on the switch are in a correctly configured port group then the next logical step is to check the Flex-10 settings.

 

I look forward to hearing from you and resolving this issue.

 

***************

 

Discussion from Dan and Richard:

 

***************

Dan in italics, Richard straight:

 

> Tim, the other thing I would ask is how are they performance benching?

 

> If memory serves me right, things like TLB and even LACP don't help

> for individual streams between 2 nodes.  You need multiple

> conversations for them to load balance.  Perhaps Mr Jones can clarify

> here.

 

That is correct.  The only bonding mode which will increase the performance of a single stream is mode-rr (iirc mode 0) which will round-robin the traffic going outbound.  Of course, if the recipient isn't using either a faster than the "bond" single link, or a bond that itself does not spread the traffic of a single connection then even mode-rr will do little good.

 

My personal opinion of mode-rr has been a dimm one.  With mode-rr there is going to be packet reordering.  Depending on the breaks, perhaps enough to trigger spurrious TCP fast retransmissions.

 

> An easy way to test is if they are using something like NetPerf, try

> opening 2 copies with different destination endpoints so you have at

> least 2 distinct conversations going to see if that increases the

> overall outbound performance.

 

For quick and easy that should work fine.

 

The script runemomniaggdemo.sh script in doc/examples/ of a netperf source tree will run those multiple streams for you semi-automagically.  One would need to edit the remote_nodes file, and the post_proc.py file will be needed to post-process the results and produce the pretty pictures.  Normally that script will try to run inbound, outbound, bidirectional and aggregate request/response but it is a small matter of editing to disable one or more of those.  (The post-proc.py script depends on python-rrdtool being installed)

 

Happy benchmarking.

 

*****************

 

And input from Lee on NIC teaming:

 

**************

 

Now to start the discussion about how network teaming /link aggregation works.   Based on the information that you provided, it seems that you are running a single process to test the network performance.   Most link aggregation methods will have a single TCP connection use a single path.  Since you are using a single process which is going to use one of the 1Gb uplinks of the port channel to connect to the switch, you are only going to see the performance of that single path. 

 

The purpose of link aggregation is to allow multiple connections through the port channel/link aggregation to use all of the available paths therefore increasing overall through-put.  Any single connection will still be limited by the maximum through-put of a single path.  

 

***************

 

Any other comments, suggestions, or questions?