BladeSystem Virtual Connect
Showing results for 
Search instead for 
Did you mean: 

Network utilization in VC-FLEX10

Trusted Contributor

Network utilization in VC-FLEX10

Avi had a customer that was saturating 1Gb LAN pipes so moved to 10Gb pipes and wondered why he was not getting 100% utilization:



When copying large file from one server to another  I get utilization of ~25% from the 10GB that configure in the profile.

The 25% utilization is average and not constant, some peeks of 100% .


When  the profile is configured to 1GB  we have 100% utilization nice and constant.


What is the max utilization I can get with 10GB.  If 100% is not the answer, than why ?


What are the differences between 1GB and 10 GB.



We got some input from Paul & Mark K. & Mark E.:




I think that your ability to flood a 1GB pipe but not a 10Gb pipe is not related to limitations of the network but the TCP/IP stack and application.

Mark E.:

Also the fact you’re copying from a file means the file has got to be copied in from a disk infrastructure (along with other OS demands on the disk). For example if that disk resides on a 4Gb/s fibre then there’s your answer. If, however,  it’s on 15K SAS spindle set (of multiple RAID10 or 0 drives), with no other demands on the SmartArray, then things could be different.


Or add more memory to the server and have a virtual disk drive in memory to do the test with.

Mark K.:

This appears to be a Windows system.  One factor to consider is the use of SMB vs SMB2.  We have worked several escalations where SMB is unable to fully utilize a 10Gb link.  This is due to limitations of the implementation of SMB in Windows 2003 and earlier.  Moving to SMB2 (Windows 2008) will show a dramatic increase in performance on 10Gb links.




Any other input? Are there other factors to be considered?

Trusted Contributor

Re: Network utilization in VC-FLEX10

More people have weighed in on this subject:

Richard explained a lot about issues with the TCP stack and what can be happening there:



Still, broadly speaking, a single TCP connection cannot make use of more than one core - perhaps up to two cores if you get really lucky with placement relative to the interrupts.  That means we are at the mercy of how much CPU is required to move data through the NICs, which brings us to 10 Gigabit Ethernet's pink elephant in the middle of the room:


From the standpoint of the defining IEEE specifications, there is nothing in 10 Gigabit Ethernet that makes data transmission any easier on the host than it was for 1 Gigabit Ethernet, or for that matter 100 or 10 Megabit Ethernet.  It takes just as many CPU cycles to send a frame through the interface for all of those.


Now, as time has passed, NIC vendors have learned, or been taught by system vendors :) how to do things *beyond* the IEEE specifications.


In the time of 100 Megabit Ethernet, it became possible to have fewer than one interrupt per packet.


In the time of 1 Gigabit Ethernet, various interrupt coalescing schemes took things farther than they went with 100 Megabit Ethernet.

Also, the mass-market NICs started to support ChecKsum Offload (CKO - something first done by the then major systems vendors with their FDDI NICs in the early 1990's) and some started supporting maximum frame sizes of 9000 bytes - what Alteon dubbed "Jumbo Frames" - a name that has stuck to this day.


Any 10 Gigabit Ethernet NIC worth its silicon will have all those features, plus support for directing interrupts to multiple cores and using multiple packet queues.  This can spread the work across multiple cores - but only when there are multiple "flows" (eg TCP connection).  The 10 Gigabit Ethernet NICs also provide support for TCP Segmentation Offload (TSO) and newer ones also include Large Receive Offload (LRO - sometimes called Transparant Packet Aggregation).


Those stateless offloads can very dramatically lower the CPU overhead of data transfer.  However...


Stateless offloads such as CKO, TSO and LRO really only come into play when the traffic is a "bulk transfer" - when the application(s) involved send rather more than an MSS's worth of data at one time.

There are many applications which do so, but not all applications do.

CKO, TSO, and LRO do little or nothing for applications making discrete, small sends - those applications are basically back to 10 Megabit Ethenet days when it comes to how much CPU will be consumed sending/receiving their traffic through the NIC.



Greg also had some words of wisdom:



Last year I had a customer ask the same question.  He could not get his Win2k8 servers to drive the 10Gb available to him via Flex10.


I suggested change the testing mechanism and that he use a multiple threaded load generator instead of copying a file across to a remote filesystem.  The results were dramatic. 


With the load generator he was able to saturate the Flex10 LOMs.  Though each individual thread could not even reach a sustained 400Mb.


So this is a limitation of Host OS.  Different OSs implement different IP stacks.



If you have solved some of the TCP or application or OS issues let us know.