StoreVirtual Storage
1751894 Members
5211 Online
108783 Solutions
New Discussion

Re: Storevirtual 3200 Latency Issue

 
richa3312
Advisor

Storevirtual 3200 Latency Issue

Hi All,

I'm wondering if anyone can help or has experianced a similar problem. Its a long post so please bear with me!

We have recently purchased a Storevirtual 3200 SFF 10GB unit.

Its currently configured with 7 x 10K SAS in 2 x 3 disk RAID 5 sets along with a spare. We have exported a single volume (network RAID 0) to a single host which uses the Microsoft ISCSI initiator with multipathing enabled (4 x 1Gb connections).

On this volume we are running a single VM using Hyper-V 2012 R2 the VM has nothing running on it, it doesn't even have its network connected.

There is no other workload currently on the unit.

With this setup we are experiancing high write latency to the storage.

The VM hosts reports 40-50ms latency on average to the exported volume.  Intermittantly the latency will drop to what we consider normal ie 1-2ms and will remain at this level for several hours before jumping back to the 40-50ms level. Within the VM the latency will be slightly higher and although it doesn't seem to cause any issues it does seem sluggish and would probably implode if any load was placed on it.

When we look at IOPS on the datastore we see 1-2 IOPS regardless of latency.

We have also tried connecting from other hosts and the same issues persists.

At first we thought we had a networking issue however we've realised that the latency is present in the performance charts on the Storevirtual itsself so this seems unlikley. Also when copying large files to the unit throughput is good and easily saturates 1Gb ethernet.

We have also noticed some strange behaviour when failing over between storage controllers. If we failover to either storage controller the latentcy dissapears when we failback the latency returns.

Oddly if we leave the controller failed over for a long period of time at some point the latency will return.

Out of interest we have run Microsofts Diskspd programs to check the IOPs on the unit and compared this to the drives on the host server.

On the host server which has 2 x 10K SAS SFF in RAID 1 with a 2GB FBWC (ar440) we see very high IOPS and throughput. If we disable the FBWC using HP SSA things look far more as we'd expect and vaguly inline with performance suggested by a RAID calculator for 2 10K disks in RAID 1.

The Storevirtual on the same test doesn't behaviour as if it has a write cache at all and performs inline with what a RAID calculator suggest for two RAID 5 arrays with a stripe.

Even without the cache I wouldn't expect to see this latency when there is no load on the unit in fact i wouldn't expect this on a single SATA drive!

Has anyone seen this before? Does anyone have a similar setup? Am I expecting too much? It just doesn't seem right to me.

I do have a case open with support but its slow and we keep going around in circles.


Thanks for looking!

PS I have graphs and screen shots which I will upload if I can work out how to!

84 REPLIES 84
HPE_Help
Occasional Advisor

Re: Storevirtual 3200 Latency Issue

Hello richa3312 - can you provide to me the support case ID so I can track down what is taking place on that side.

Also what type of drives are involved in your set up?

Thank you,

Karl

richa3312
Advisor

Re: Storevirtual 3200 Latency Issue

Thanks Karl,

The case reference is 5317502427

I had another support session on Friday and they have now escalated it to 3rd line after running some IO meter tests and seeing high write latency. I'm expecting them to come back to me at some point tomorrow.

The unit has  7 x SFF 10K 1.8TB SAS drives in it.

Regards,

 

 

referencepoint
Advisor

Re: Storevirtual 3200 Latency Issue

We also have a StoreVirtual 3200 unit suffering from write latency issues. Ours is running 10GbE iSCSI to 3 ESXi hosts. The unit is home to 4 SSDs, 21 10k SAS drives and 12 7.2k SAS drives - performance is awful on every tier.

I'd be really keen to know your resolution if and when you get one!

SVprodmgr
Frequent Advisor

Re: Storevirtual 3200 Latency Issue

Regarding support case 5317502427, this was resolved by reviewing how SV3200 gives acks to the host.  When there is no IO being done by the VM then there is no return IO for the host and ack gets sent after a time out.  This is what caused the high latency in this case.

I'm an HPE employee working in product management
GuillaumeRainer
Advisor

Re: Storevirtual 3200 Latency Issue

Hi,

If I may butt in... We also happen to have a 3200/10G iscsi (portchanneled LACP, hosting 4 SSDs and 71 SAS drives), attached to a pair of 5700 procurves and half a dozen dl360 g9 servers (running ESXi 6.0U3 and one on 6.5 for tests).
The intriguing thing is we do get ~20 mb/sec on a vmware hosts running the regular iscsi software initiator; when doing iscsi within a virtual machine on the same host, we hit ~200 to 300 mb/sec throughput.
We do get a lot of TASK_SET_FULL (0x28) within vmware and it seems the array or esxi host is throttling for whatever reaseon, but didn't succeed in finding out why.

what did you tune within the ACKs ? we already tried delayed ack and checked for e.g. iscsi TOE settings, but to no avail.

Bart_Heungens
Honored Contributor

Re: Storevirtual 3200 Latency Issue

So what is the solution when seeing high latency? Host side or SV side? Will there be a software update? Or is it a OS tweak?

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star" !
richa3312
Advisor

Re: Storevirtual 3200 Latency Issue

Hi,

Yes we never got a fix as such. The answer seems be that when there is no IO the ACKs are delayed which skews the latency we see on the storage. This is completley different to any SAS/FC MSA I've ever used where no IO means 0ms response. We also still see periods where the IO doesn't change but the latency drops from 25ms to 1-2ms for period of a few hours before jumping back up again. 

See the pics below its certainly not what I would expect. How does this compare with your unit?

LatencyLatencyIOIO 

 

Performance wise testing comes back with the expected results for the number, type and RAID setup so once loaded it seems to perform as expected. 

The only thing that seems odd is that the 3rd line chap I spoke said that the sv3200 won't cache random writes only sequential writes which I can't quite get my head around. I always thought that the cache was there to buffer writes and help even out performance. All the performance testing we have done with random writes seems to bear out that it is not cached by the device ie it appears to operate in write through mode. As mentioned in my original post we compared it to a DL360 with the write cache enable on the RAID card and the results are vastly different.  It would be nice to fully understand how the cache works as it seems different to other storage devices we have used in the past.

We're planning on monitoring it and seeing how it goes as we've had to bring the unit into production. Also we are planning on upgrading the unit with flash and additional spindle's in the not to distant future so it would be interesting to see if that makes any difference.

 

GuillaumeRainer
Advisor

Re: Storevirtual 3200 Latency Issue

Hi, 

regarding the comparison: we are not in production - it is a test unit, to be used for every kind of virtualized test system we have on premises). It is kind of difficult to pin-point it, but we got some impression coming from raw numbers dumping (linux dd, 1MB blocks): 
vmware server with luns (vmfs), VM on top -> 20 mb/sec
vmware server, VM on some storage, iscsi from within VM -> 200 mb/sec
vmware server, VM on some storage, RDM pass-through -> 190 mb/sec

I´m not keen on debating whether it is 3 mb/sec more or less, it just seems vmware has some troubles with the SV, as other arrays do work fine. maybe VMFS has some kind of problem with iscsi on SV3200 (and not on LH4530, LH4730, netapp, MSA) ... or perhaps we missed something else... 

regarding the basic array performance, when doing iometer from within a VM or physical machine, we get the following numbers: 
nRAID0 8k 100% READ 68MB/s and 8.300 IOPS @ ~10ms (4 Worker, 20 Out IOPS)
nRAID10 8k 100% READ 90MB/s and 11.000 IOPS
nRAID0 2MB 100% READ max 190MB/s

the question as to WHY performance sucks with vmware/vmfs is somewhat unclear, the number of waitstates and throttles within the vmware log also does seem alarming. 

/btw: sorry for highjacking your thread, but I feel some comparisons and talking it out could be helpful ^^

SVprodmgr
Frequent Advisor

Re: Storevirtual 3200 Latency Issue

Thanks for the good data and observations.  If you feel that the latency or performance is not right on your SV3200, I would encourage you to contact HPE and open a case to get the experts to investigate.  There isn't enough information on this message thread to determine the root cause and it might be faster to let us look at each specific situation. 

Amy Mitchell

HPE StoreVirtual Produt Manager

I'm an HPE employee working in product management