StoreVirtual Storage
1748182 Members
3315 Online
108759 Solutions
New Discussion юеВ

Re: Performance issues when using VSA on ESX with VMXNET3 driver

 
Rufo
Member

Re: Performance issues when using VSA on ESX with VMXNET3 driver

I know that this is a very old thread... but I seem to be having this issue with ESX 5.x and VSA 11.0. 

 

Was this ever resolved? Also, I do not see a way to change the VMXNET3 to E1000 (not during the install).

 

Please advise. Thanks.

douglascountyit
Occasional Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

FYI: I am also having the same problem very high cluster write latency with ESXi 5.5 (build 1623387) and VSA 11.0. latest patches on everything.

 

write latency on each node is fine, only the cluster write latency is bad 50ms to 150ms.

 

it seems to be vSwitch related like this thread says. I will try to split my VSA and my software iSCSI ports as suggested. What a waste of NICs.

Rufo
Member

Re: Performance issues when using VSA on ESX with VMXNET3 driver

Another (lame?) solution is to disable the MPOI for the unit, ie: do not select Round Robin as the policy.

douglascountyit
Occasional Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

turning off round robin for the ESXi software iscsi selection policy also fixes the problem?

douglascountyit
Occasional Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

Has any one tried modifying any of these advanced network settings within ESXi??

 

 

Rufo
Member

Re: Performance issues when using VSA on ESX with VMXNET3 driver

Turning RoundRobin off in my lab fixed the issue.

douglascountyit
Occasional Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

I am having the write latency issue on 1 production cluster and 1 lab cluster. the production cluster is using round robin with two paths to each datastore.

 

the lab cluster currently is using fixed path policy and it only has 1 iSCSI NIC and 1 available path.

 

 

write latency is very poor on both clusters (esxtop shows 200ms+ for write latency) so I don't think the path policy is helpful in my situation.

Princes
Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

I feel compelled to contribute since we have experienced the same latency issues using VSA 11.5 / ESXi 5.5 u2. Similar to other contributors experiences in this discussion, the latency seems to occur whenever the VSA cluster is accessed via a local gateway VSA node, thus requiring iSCSI traffic to pass through the local ESXi vSwitch network stack. Accessing the cluster via a remote VSA gateway on another host shows good performance in contrast. The issue would seem to be that having your VSA node sharing the same local vSwitch as your iSCSI vmk ports, introduces the latency if you are accessing a VSA presented datastore that the VSA cluster has determined should be presented by the same local VSA node on the same vSwitch.

 

This infers that it is as likely to be a hypervisor network stack performance issue as a VSA cluster issue.

 

Our set-up:

 

2 x HP DL380 Gen8's; local 15K SAS HDD Storage Array; vSphere ESXi 5.5 u2 (HP Build)

2 v HP VSA 10TB v11.5; Software iSCSI Adapter; Standard twin path iSCSI Initiator configuration.

Network is 10GbE with Jumbo Frames (9000MTU). Throughput to non-local VSA node is around 3-400MB/s @ <20ms latency. Throughput to local VSA node is around 1-200MB/s with >1000ms latency spikes.

 

The VSA paths tend to settle on a pattern where one particular volume / datastore presented by the cluster VIP is always mapped to a local VSA on a particular host. This is desirable since this offers load balancing between VSAтАЩs. However, often this will mean that VSA Datastore 1 being accessed by ESXi Host 1 via its local VSA, and VSA Datastore 2 is being accessed by ESXi Host 2 via its local VSA respectively. Storage degradation is then experienced by ESXi Host 1 on VSA Datastore 1 (local) but not on VSA Datastore 2 (remotely accessed via pNIC / Switch), and vice versa.

 

Running various storage performance tools, it seems that the throughput / latency to the local VSA node begins acceptably, but as you ramp up the test data it suddenly seems to become saturated wherby latency goes through the roof. Using Round Robin Path Policy at iops 1 or default 1000 gives very good storage performance on the non-local VSA, but abysmal performance on the local VSA. Defaulting to Most recently Used Path Policy gives poorer but acceptable performance on the non-local VSA, and poor performance on the local VSA, however latency seems to remain just within acceptable tolerances - still spiking occasionally to several hundred ms, but averaging between 20-30ms. The inference perhaps is that the lower throughput / path switching reduces the frequency of the saturation of the local hypervisor network stack with iSCSI traffic passing between a local target and initiator.

 

As suggested here already in this dicussion, the solution would seem to be to separate out the VSA and the iSCSI Software Initiator vmk's, however we have no more pNIC's to offer each ESXi Node at the moment and 10Gbe cards and switch modules are expensive!

 

Hope all this helps someone.

douglascountyit
Occasional Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

I agree with your conclusions, but getting VMware to resolve the bug will only happen if a very large customer of theirs complains about this. Any customer large enough to have the clout needed will probably not be using HP VSA.

Princes
Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

Quite likely I have to agree. I doubt if my organisation has the clout certainly....

 

One solution I have come up with since last posting that I'll share if its of use to anyone:

 

As I suggested in the last post, the issue described can be potentially negated by separating out both iSCSI target and initiator interfaces used on the VSA. The goal is to avoid sending iSCSI traffic through the ESXi hypervisor network stack locally on a host, which seems to inroduce a lag with certain pathing / load balancing configurations. Adding additional 10GbE adaptors however is expensive and adds more cabling complexity to the solution. One way round is to use a HP FlexFabric 10GbE adapter that supports NPAR, such as the 533FLR-T. This allows partitioning of the two physical 10GbE interfaces into additional virtual adaptors in ESXi. My idea is to split each physical 10GbE port into 2 x virtual adapters in ESXi, then distribute these virtual adapters between 2 x vSwitches for iSCSI traffic rather than a single one with 2 x pNIC's as before (with the usual sharing of a virtual adapter from each physcial port in order to retain port failure resiliency). If we then split the target / initiator interfaces between the two vSwitches, this forces all iSCSI traffic to leave the hypervisor to reach its destination using a physical port. It does mean that adapter bandwidth is halved - i.e. 10GbE could be partitioned into two virtual ports at 5GbE each. However, this should still be more than acceptable for your average iSCSI setup, especially when using MTU9000.

 

I'll post results of testing this when I next have an oppertunity to build the same sort of rig again soon.