Storage Boards Cleanup
To make it easier to find information about HPE Storage products and solutions, we are doing spring cleaning. This includes consolidation of some older boards, and a simpler structure that more accurately reflects how people use HPE Storage.
HPE StoreVirtual Storage / LeftHand
cancel
Showing results for 
Search instead for 
Did you mean: 

VSA High Latency

joe89
Occasional Contributor

VSA High Latency

I´m new to StoreVirtual VSA and have some strange Latency Issues with a newly installed vsa environment.

Setup:  2x DL380G9, 3x 480GB SSD, 5x 900GB SAS, ESX 5.5 U3 with latest patches. 10GBit/s iSCSI Network

Two VSAs with AO enabled for all Volumes, latest version of MEM for 5.5, Delayed Ack disabled 

My Problem: The read and write latencys on ESX-Server 2 / VSA2 are always very high

Average is more than 10ms with spikes up to more than 400 ms.

If i  migrate the virtual servers from ESX2 to ESX1, the latency goes down to Average 1-2ms and the

spikes are max 10-20 ms. 

any help would be appreciated. Thanks.

4 REPLIES
Stor_Mort
HPE Pro

Re: VSA High Latency

Hi Joe89, 

It sounds like you are doing a lot of things right. We generally don't worry about the peak latency numbers if they happen less than 5% of the time. The average latency is a much better indication of the user experience. For SSD, expect less than 5ms, SAS should be less than 20 ms under heavy load as monitored at the CMC. So it sounds like things are not too bad.

The difference between the two host environments is interesting. You can try to isolate this by looking at the CMC performance monitor and comparing the cluster latency (which is the average for all volumes from when an iSCSI request is received to when the response is sent) to the storage node latency (which is a better indication of the speed of the underlying storage raid sets.)

Workload characteristics like transaction size, r/w ratio and address randomness make a big difference. Moving VMs around or just a different time of day causes workload changes that can make results difficult to interpret. Isolating this to a repeatable factor which can be fixed is usually tricky and time-consuming. 

Because load is created by VMs, not volumes, it's useful to break out all of your VMs (or at least small groups of them) into individual SAN volumes. This reduces contention and makes it easier to see if a particular VM is causing a problem. StoreVirtual performs better with large numbers of volumes rather than a few big ones. (You may need to un-install the MEM driver if this creates too many iSCSI sessions for VMware to manage.) It doesn't improve performance to use more than a few dozen volumes, however. The CMC performance monitor can provide latency readings for each volume.

I am an HPE employee - HPE StoreVirtual Support
joe89
Occasional Contributor

Re: VSA High Latency

Hi Stor_Mort,

many thanks for your tips. I´m not sure but i think that i've found the issue.

I'm getting a lot of these errors in vmkernel-log of ESX2:

cpu14:32852)<3>[bnx2x_alloc_rx_sge:734(vmnic9)]Can't alloc sge
cpu14:32852)<3>[bnx2x_esx_init_rx_ring:1944(vmnic9)]was only able to allocate 0 rx sges
cpu14:32852)<3>[bnx2x_esx_init_rx_ring:1946(vmnic9)]disabling TPA for queue[4]

ESX1 has none of these erros.

Looks like a problem with the network card.... 

 

Stor_Mort
HPE Pro

Re: VSA High Latency

Good eye! Something going on with vmnic9. Could even be the cable. I'd check the bnx2 driver and NIC firmware versions, too. Or maybe just swap it out if you have a spare.

I am an HPE employee - HPE StoreVirtual Support
joe89
Occasional Contributor

Re: VSA High Latency

Update:  I have updated the NIC Driver and Firmware to the latest Version.

The latency on ESX- Node 2 is slightly less than before the update, but always worse than on ESX- Node 1.

At the moment there are only 4 Virtual Servers running on the two esx servers and every VM has its own datastore / volume.

If the VMs are running on ESX1 the Latency is very low, if there are running on ESX 2 the Latency grows up.

(Creation of a Oracle Dump File -> 50GB on ESX1 = 1,5 Hours, on ESX2 = 4 Hours)

I have then uninstalled the MEN and reverted back to round robin...

Now the latency on ESX 2 is same as on ESX1 but the ESX- Servers do not really load balance between the 2 VSA Nodes....

Load Balancing is really better with MEM, but the Latency on ESX2 is too high with it.... ?

would be grateful for any suggestions....