StoreVirtual Storage
1748029 Members
5173 Online
108757 Solutions
New Discussion юеВ

Re: Performance issues when using VSA on ESX with VMXNET3 driver

 
M.Braak
Frequent Advisor

Performance issues when using VSA on ESX with VMXNET3 driver

Hi,

 

I want to share a big performance issue with you.

 

Currently there is a big problem when using HP P4000 VSA's on VMWare when using VMXNET3 driver.

When the VSA is colocated on a ESX server with other VM's and the gateway node of a SAN volume is the locally hosted VSA node then there is a huge performance problem when the ESX server itselve uses the volume (for example deleting a snapshot)

Latency of the volume goes sky high (300+ms) and IO's are very slow.

 

VMWare also ackowledges this problem. There seems to be a problem with the TSO of the VMXNET3 driver which is being bypassed by the ESX server which causes severe performance degradation.

 

When you change the VMXNET3 driver of the VSA to E1000 the problem is solved, however i'm still waiting on a reply of HP if using E1000 is supported.

 

I''ll keep you updated

86 REPLIES 86
Wvd
Occasional Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

Please keep us posted on this issue.


We are experiencing something similar.

Config is two VSA 9.5 with flexible nic's on DL380 G7's with ESXi 4.1 U1.

 

The cluster can perform normally for some time but then suddenly one of the ESX hosts experiences heavy write latency (150ms+) to it's local disk. Due to the network raid 10 this affects the whole P4000 cluster.

Only way to restore performance is to shut down the bad performing VSA node and reboot the ESXi server.

 

Strange with our case is that the local disk performance is affected even after shutting down the node.

Adding a local disk on the VSA datastore to a virtual machine still shows bad write latency.

This leads me to believe that the write cache got disabled for some reason but the hardware status makes no mention of this.

 

Sounds like a hardware issue but we have seen the local write latency happen on both servers.

Firmware level is up to date with latest firmware DVD 9.30.

 

Your story makes me reconsider the vmxnet3 driver as a suspect.

The nic is configured as flexible but in the kernel.log of the VSA the vmxnet3 driver is mentioned:

 

Jan 16 09:19:09 vsa2 kernel: VMware vmxnet virtual NIC driver
Jan 16 09:19:09 vsa2 kernel: GSI 18 sharing vector 0xB9 and IRQ 18
Jan 16 09:19:09 vsa2 kernel: ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 19 (level, low) -> IRQ 185
Jan 16 09:19:09 vsa2 kernel: Found vmxnet/PCI at 0x14a4, irq 185.
Jan 16 09:19:09 vsa2 kernel: features: ipCsum zeroCopy partialHeaderCopy
Jan 16 09:19:09 vsa2 kernel: numRxBuffers = 100, numRxBuffers2 = 1
Jan 16 09:19:09 vsa2 kernel: VMware vmxnet3 virtual NIC driver - version 1.0.11.1-NAPI

M.Braak
Frequent Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

Just got off the phone with HP support.

The E1000 driver is officially not supported by HP. But HP support advices me to use it when it's performing better in our case ?!?!?!

 

They wont investigate the problem further cause in their opinion it's a vmware problem and should fix it.

 

I'm awaiting further information from VMWare.

 

In meanwhile my opinion is that the Lefthand VSA is cripled for this moment and should not being used on ESX servers with VM's locally hosted on the same server until this problem is fixed.

 

Using Flexible interfaces doesn't show the extreme behaviour as VMXNET3 but it also shows weird latencies some times.

 

I have tested this on several different hardware and all the same problem.

 

VMWare mentions also the following possible workaround : Create a seperate vSwitch on which you connect only the VSA, but this option needs additional hardware NICS. This way the TSO of the VMXNET3 driver wont be bypassed.

 

Tedh256
Frequent Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

"This way the TSO of the VMXNET3 driver wont be bypassed."

 

What is the "TSO"?

 

Also - best practice already dictates that a seperate vswitch be used for the VSA/iSCSI traffic - that should be no burden! If you are planning a virtual host, you need to incorporate enough interfaces for the host storage access and guest communication, but ...

 

I am not certain that I understand why/how having a seperate vswitch for the VSAs prevents the VMXnet3 "TSO bypass" - could you help me understand what's going on?

 

 

M.Braak
Frequent Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

TSO = CheckSum Offloading. So checksum calculations are done by hardware (NIC) instead of CPU

 

iSCSI traffic should always be on a seperate vSwitch indeed. But VMWare meant a seperate vSwitch for the VSA and a seperate vSwitch for the VMKernel iSCSI network. So whe you als want redundancy you need 4 physical NICS this way. 2 for each vSwitch.

 

When using two vSwitches VMWare uses a different path internally to communicate and this way TSO of the VMXNET3 driver could function properly. (I didn't tested this possible workaround however!)

 

Tedh256
Frequent Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

huh

 

but this problem only applies to situations where you are running VMs (other than the VSA VMs themselves, I presume?) on local storage?

 

Why would you want to do that - if these hosts are running VSAs wouldn't you simply use up all local storage so that it can be presented as shared storage?

M.Braak
Frequent Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

No, all local storage is used by the HP VSA and is provided as a iSCSI volume/datastore to ESX servers. (used in small enterprises)

When you have VM's hosted on the same ESX node as which the VSA (which is used as gateway node for the volume) is hosted then you have this problem when for example deleting a vmware snapshot. (All cases in which the ESX node itselve communicates with the datastore)

 

Traffic from within VM's to the datastore dont suffer this problem.

 

So local storage of the server is only being used by the HP VSA.

 

Just test it for yourselve:

Deploy a VSA (with VMXNET3 nic) on a single ESXi 4.1 server

Create a volume on the VSA and create a vmware datastore on it

Now let the ESXi server perform some traffic on the datastore by commiting a snapshot or a much easier way:

 

Execute the following command from an SSH shel on the ESXi node

# cd /vmfs/volumes/[datastorename goes here]

# time dd if=/dev/zero of=testfile count=102400 bs=1024

 

This command creates a 100MB testfile on the datastore. Creating a 100MB file should be a matter of 1-2 seconds!!! Times could go up to even minutes.

Also check the datastore read and write latency of the datastore from viclient/vcenter.. (200+ ms as soon as you start creating the file!)

 

 

 

RonsDavis
Frequent Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

On my 9.0 VSAs the Nics are set to flexible. Why are you using VMNET3 anyway? Does it come standard on newer OVFs?

 

virtualmatrix
Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

 

FWIW --

 

We saw similar symptoms a year or so ago, but it was reproducible with any virtual nic device and on both 10 GigE and 1 GigE networks.  With that said, perhaps it could be more prevalent with vmxnet3 or perhaps it was just a different issue altogether.

 

Do you see this problem with vmxnet2?

 

In our case, the cause was thought to be due to vmkernel race and locking issues across the multiple vmdk layers.  It was most easily triggered with operations such as cloning, snapshots, and zeroing... but it wasn't reproducible-on-demand.  We changed all of our VSAs to use RDMs to the local storage instead of VMDKs-on-VMFS and the problems immediately disappeared.

 

That was back with ESXi 4.x and VSAs at SAN/IQ 8.x.  We're now running ESXi 5.0 and San/IQ 9.5.  Some VSAs are using vmxnet2 -- no issues.  We haven't tried vmxnet3.

 

Using RDMs removes an "unnecessary" layer since the only thing on the datastore is the data VMDKs for the VSA anyway.  It may be quicker&easier for new administrators to setup a VSA by just setting up VMDKs on a VMFS, but it sounds like you're quite comfortable getting around the ESXi shell.  To create the RDMs, we used vmkfstools.

 

HTH

5y53ng
Regular Advisor

Re: Performance issues when using VSA on ESX with VMXNET3 driver

This is very interesting. I experienced this behavior as well, but I was unaware of the root cause. I witnessed extremely high latency numbers and poor throughput when I used the VMXNET3 adapter on my VSA. I could only clear the symptoms by rebooting the host. Since I was unable to explain the cause of the latency problems I abandonded further testing with the VMXNET3. When the VMXNET3 was working properly I did not see a significant performance increase over the flexible adapter anyway.