StoreVirtual Storage
1748283 Members
3715 Online
108761 Solutions
New Discussion юеВ

Re: Performance issues when using VSA on ESX with VMXNET3 driver

 
5y53ng
Regular Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

I wanted to update this thread as I have been able to pinpoint the exact cause of the latency problem after repeated testing in two different environments. If some of you could try to duplicate my findings that would be great.

I found the cause of the high latency is due to setting the IOPS per path lower than the default of 1000 using the command:

esxcli storage nmp psp roundrobin deviceconfig set --device=naa.xxx --iops 1 --type iops

Watching the device latency in esxtop shows that after applying this setting the latency to my SANiQ volumes increases dramatically for any virtual machine that happens to be running on the same host as the gateway VSA. In some cases the latency is in the 1000's.

Changing the IOPS setting on the fly and watching esxtop can reproduce or eliminate the high latency at will.

If someone else could try this and verify they see similar behavior that would be awesome.

My configuration mimics what HP recommends, separate vswitch and (4) NICs for iSCSI, iSCSI port bindings, psp set to round robin, etc.

Hope to hear back from some of you.
RonsDavis
Frequent Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

Just to clarify, when the default is used, you see low latency, when you set it to 1, you see high latency?

Have you tried somewhere down the middle, say 100?

 

5y53ng
Regular Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

Hi Rons, That is correct. I haven't tried anything in between. The difference is very noticeable however. I still see occasional blips in latency with the IOPS at 1000, but there's a vast improvement overall.

I think the root cause of the latency is having a VSA portgroup sharing vmnics that are used for iscsi port bindings. I'm am testing to prove that theory now and will report my findings.
hack-the-planet
New Member

Re: Performance issues when using VSA on ESX with VMXNET3 driver

For those of you reporting performance issues, how are your vSwitches configured? I ran into a huge latency issue (ESX 5.0 software iSCSI, SANiQ 9.5) when my vmk bound to iSCSI was sharing the same vSwitch as the VSA node... this was with the system largely idle. By simply moving the VSA to its own vSwitch, and forcing iSCSI traffic out through the physical switch we saw dramatic improvement. The latency was only being seen by the ESX host local to the VSA, conditions were reproducible with other ESX hosts. Each time, the second ESX host in the cluster (remote to the first VSA) saw no issues with latency when attached to the first VSA. Single physical switch with separate VLANs for iSCSI and DATA used to connect the ESX hosts.

 

I've only deployed VSA using the flexible adaptor, always keeping the VMXNET3 disconnected in vSphere. iSCSI VLAN is also always routed to allow VSA's to communicate with email, NTP, CMC etc. in the data network. HP VSA is a great piece of software IMO, I will be testing ESX 5.1 SANiQ 10 in the coming weeks.

 

HTH

thanks,

 

 

 

 

5y53ng
Regular Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

"or those of you reporting performance issues, how are your vSwitches configured? I ran into a huge latency issue (ESX 5.0 software iSCSI, SANiQ 9.5) when my vmk bound to iSCSI was sharing the same vSwitch as the VSA node... this was with the system largely idle. By simply moving the VSA to its own vSwitch, and forcing iSCSI traffic out through the physical switch we saw dramatic improvement."

 

This is exactly what I am seeing. I wrote in my previous post that the iSCSI port bindings and the VSA on the same vSwitch seems to be the root cause of the majority of the latency. Setting the IOPS = 1 makes the problem much worse.

 

After separating the VSA from the iSCSI vSwitch the latency improved dramaticaly. Changing the IOPS doesn't seem to make any difference with this configuration.

 

In some cases I see a  VSA max out a 2 Gbps etherchannel (3 node cluster).  I would imagine there must be extreme resource contention when that volume of traffic is mixed with iSCSI traffic.

 

It seems separating the VSA and iSCSI initiator is the way to go.

 

M.Braak
Frequent Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

That is indeed a workaround HP provided me. Downside is you need additional nics and switch ports.

I still find it very hard to believe HP still didn't fix this issue.
RonsDavis
Frequent Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

You could do something like this, http://blog.davidwarburton.net/2010/10/25/rdm-mapping-of-local-sata-storage-for-esxi/

But this is NOT supported by VMware. I personally won't run my production storage on an unsupported solution.

Some local storage can use RDMs out of the box, if you have that set up, then great, run with it. I would, because I also feel like RDMs have to be at least a little bit faster. I have just never seen any documentation to show that they are. VMware will tell you there is no performance improvement. A former director at LeftHand also told me there was no improvement.

5y53ng
Regular Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

I tried to configure my VSAs to use RDMs and even carved out 5 additional luns on my RAID controller to do it. Unfortunately, the RAID controller didn't allow me to use them. I think the exact reason was due to the RAID controller not reporting a unique NAA number for each lun. I read on these forums that RDMs cut down on the latency, which made me eager to give it a shot.

 

Now that I have eliminated as much latency as possible via network and iSCSI settings I notice the read latency is a still little high for the VSA. With very low IO to the SAN, in the dozens of IOPS according to CMC, the VSA read latency hovers around 20 ms. The write latency is fine, likely due to the RAID controller cache. I see this same behavior on systems with 8 spindles and also on systems with 25 spindles. I guess this is more or less normal as even the best predicitive read-ahead-and-cache algorithm won't help with random reads.

 

When you factor in the operation of the VSA, for example; the gateway VSA may have to request blocks from the other VSAs, then that server's seek times, and then transferring the IO back through the gateway VSA to the initiator, it makes sense the latency is going to be a little higher than usual.

 

If HP would create their own NMP/SATP/PSP for ESXi that functions similar to how the DSM works for Windows that would probably help with the performance. If I understand correctly the DSM for Windows has a gateway to each VSA and accesses the appropriate VSA for any given block. Someone had a good post recently on here that called out the differences.

 

I can live with the latency, because of what the VSA allows me to do. If space is a concern and you have extremely high consoldiation ratios the VSA is the best option out there.

 

 

 

 

adiare
New Member

Re: Performance issues when using VSA on ESX with VMXNET3 driver

5y53ng

 

I am using VMware ESXi 5.0.0 build-469512.  (ESXi5.0 base).  I have a single node 9.5 VSA deployed on this ESX server and i have a vSwitch configured so that my iSCSI adapter vmhba33 is bound to the vmkernel port on vSwitch1.  vSwitch is also where all VSA iSCSI traffic is configured to be.

 

I have a HP DL370G6 with the integration NC375i (quad port network card) and P410i Smart Array controller.

 

I created a volume in CMC and presented it to my ESXi5.0 server via iSCSI.  I created a datastore on this volume and use it for IO tests to try and duplicate the high latency issues.

 

I am unable to duplicate the issue you encountered below and were hoping you could share more configuration details.  

 

On the config above a ~1GB file create using dd takes about 10 seconds as I am able to get over 100MB/s of write throughput and latency less than 1ms range.

 

/vmfs/volumes/52180571-4a9dfce4-fab5-0025b3a8ec7a # time dd if=/dev/zero of=testfile bs=1024 count=1024000 1024000+0 records in

1024000+0 records out

real 0m 12.07s

 

 

For a ~10GB file create similar latencies are observed and i get:

/vmfs/volumes/52180571-4a9dfce4-fab5-0025b3a8ec7a # time dd if=/dev/zero of=testfile bs=1024 count=10240000 10240000+0 records in

10240000+0 records out

real 1m 41.95s

 

So i think i may be missing a key configuration item to duplicate these high latency issues.

 

For my test, I decided to use a single vSwitch and keep everything, VSA, management network and iSCSI traffic on that same switch.  So vmhba33 (iSCSI initiator) is bound to only vmk0.

 

/vmfs/volumes/52180571-4a9dfce4-fab5-0025b3a8ec7a # esxcli iscsi logicalnetworkportal list -A vmhba33
Adapter Vmknic MAC Address MAC Address Valid Compliant
------- ------ ----------------- ----------------- ---------
vmhba33 vmk0 00:25:b3:a8:ec:78 true true

 

 

The vSwitch info looks like this:

vSwitch0
Name: vSwitch0
Class: etherswitch
Num Ports: 128
Used Ports: 5
Configured Ports: 128
MTU: 1500
CDP Status: listen
Beacon Enabled: false
Beacon Interval: 1
Beacon Threshold: 3
Beacon Required By:
Uplinks: vmnic0
Portgroups: VM Network, Management Network

 

 

Thanks ahead.

Sbrown
Valued Contributor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

try doing 2 svmotion -> one from das over network to vsa

one from iscsi to das

 

then run CDM (pure random mode) on 3 clients at the same time 100 % random read/write

 

watch the network stack crumble and hosts doing svmotion skew time