StoreVirtual Storage
1751805 Members
5223 Online
108781 Solutions
New Discussion

Re: Storevirtual 3200 Latency Issue

 
jbanger
New Member

Re: Storevirtual 3200 Latency Issue

It is very telling that the SV3200 ISCSI model is not on the vmware HCL.

As a longtime lefthand user, and someone wanting to jump on the SV3200 platform  - Please sort this ASAP HP!

GuillaumeRainer
Advisor

Re: Storevirtual 3200 Latency Issue

that HCL issue would explain a lot - but then, hp expressely wrote it in their specs etc. and insist on 6.0U2 etc...So no, it would not be valid to back out now.

Anyways, support is more active then ever - third level diagnosed the transport servers on my box are unevenly distributed - all LUNs were patched to one controller. That seems to be an option one cannot set by hand, so I had to reset and failover the controllers. Furthermore, I should not use or even set the VIP (that was on the first call, too) and also I shall not use locations. Just wondering how I would ever to 2-site-replication, but at the moment it seems that moment is pretty far away...

After cleaning out config an failover-/rebooting controllers, I got a bit more throughput and the following performance numbers.:

VMFS Test#7
read iops 12.616, latency 7,2/63,5 msec    
write iops 3.391, latency 23,3/205,3 msec

Reference RDM/NTFS    
read iops 13.900, latency 5/197 msec
write iops 6.900, latency 11/90 msec

So basically reading is still much faster with RDM ( I forgot to set it to nraid10 like the other vmfs volumes, which would be even faster) and writing is thrice (!)  as fast, as well as low in latency...
My MSA200 flash runs writes at 0.9 msec latency on average, 30 msec max - the SV3200, while running flash, will not even start with those numbers...

referencepoint
Advisor

Re: Storevirtual 3200 Latency Issue

I've applied these settings to my vSphere 6.5 instance. Some were already set, e.g. the ATS fix (which is mentioned earlier in this thread), and the IOPS limit. Also, the "Maximum Outstanding Disk Requests" setting (Disk.SchedNumReqOutstanding) doesn't appear to exist for me.

Either way, after benchmarking a VM on all flash, I still see crippling write latency:

  • 100/0 R/W = 10k IOPS @ 0.8ms
  • 60/40 R/W = 46 IOPS @ 225ms
  • 0/100 R/W = 900 IOPS @ 11ms

* Tests used DiskSPD, 64k block size.

These numbers are an absolute joke, and I wonder if HPE is just clutching at straws at the moment, offering up 'fixes' that are mostly just iSCSI best practices. The fact of the matter is that this storage does not work properly, and this needs a solution ASAP. My SV3200 is part of a £75k virtual solution which is now 6 months old, and is still yet to see a production workload, as it is incapable of running them properly.

mtroper
Occasional Advisor

Re: Storevirtual 3200 Latency Issue

For me the same. Performance is still terrible.

Now HP L2 support wants do do some more IO performance testing ;-) (its a joke, that all customers with open cases regarding SV3200 performance make them)

The support should better concentrate their time to solve the issue on their lab setup.

 

Ive done some further testing and found that the following messages appear from time to time in the vmkernel.log:

2017-06-03T16:14:20.009Z cpu7:32812)NMP: nmp_ThrottleLogForDevice:2349: Cmd 0x2a (0x412e836d2ac0, 87486) to dev "naa.6000eb31f542be7a0000000000000fea" on path "vmhba33:C2:T0:L3" Failed: H:0x0 D:0x28 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:NONE

2017-06-03T16:14:20.244Z cpu1:32806)ScsiDeviceIO: 2325: Cmd(0x412e867e6c40) 0x28, CmdSN 0xab from world 87486 to dev "naa.6000eb31f542be7a0000000000000fea" failed H:0x0 D:0x28 P:0x0 Possible sense data: 0x0 0x0 0x0.

 

Seems related to:

VMK_SCSI_DEVICE_QUEUE_FULL (TASK SET FULL) = 0x28

This status is returned when the LUN prevents accepting SCSI commands from initiators due to lack of resources, namely the queue depth on the array.

Adaptive queue depth code was introduced into ESX 3.5 U4 (native in ESX 4.x) that adjusts the LUN queue depth in the VMkernel. If configured, this code will activate when device status TASK SET FULL (0x28) is return for failed commands and essentially throttles back the I/O until the array stops returning this status.

 

https://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1008113

 

 

 

Johannes_we
Frequent Advisor

Re: Storevirtual 3200 Latency Issue

I´d suggest you to just return the system and get a suitable replacement.
This thing is not worth tryting to optimize performance as far there is no major change in the internal design be it hardware or software.

mtroper
Occasional Advisor

Re: Storevirtual 3200 Latency Issue

Good suggestion ;-)

Now the question is, how to get HP to replace of the unit....?

The actual storage is already paid be our customer. It will not be easy to explain, why a brand new high performance Storage is a piece of crap.... (specially because we adviced him to buy the SV3200 instead of a MSA2040)

For me its clear, if HP cannot sort that issue out, we will switch to another manufactorer (not only on storage)

Its a shame, that HP is not able to get that device running fine after serval month...

@Amy: maybe its time for you to take some action! (instead writing that all customers with issues should file a support case, which lead only to bother paing cutomers to make useless performance meterings)

I will now contact our distributor to have a little talk on this issue.

regards

Marc

GuillaumeRainer
Advisor

Re: Storevirtual 3200 Latency Issue

some short update:

I got some feedback, HPE is working on some fix (unknown ETA) and some time in summer, a new major version shall be available. Which is pretty vague, but at least something seems to be happening...

I tried my way around using full provisioned volumes and VMDKs (eager zeroed) - that seems to up the performance to more or less "normal" values. The downside being that 1, one loses a lot of space when compared to thin provisioning, 2, waiting for zeroing volumes before usage and 3, the performance dropping again when having concurrent access.

Unfortunately, I wouldn't know what chassis HP has in store with the same capabilities - so swapping seems a no-go, lest one thinks of a 3par...

referencepoint
Advisor

Re: Storevirtual 3200 Latency Issue

Have you seen that the MSA line has been refreshed with the 2050/2052 recently?

https://www.hpe.com/uk/en/product-catalog/storage/disk-storage/pip.hpe-msa-2050-san-storage.1009949622.html

200k IOPS on flash, >5GBps sequential throughput, auto tiering, better scalability. The figures are WAY better than the StoreVirtual 3200.

We are considering this as a replacement, especially as we have a few MSAs here (P2000 G3s) that have performed flawlessly for 5+ years. I've lost faith in the StoreVirtual 3200 being able to handle my production workloads, both now and in the future.

GuillaumeRainer
Advisor

Re: Storevirtual 3200 Latency Issue

Hi,
yes, I've seen it - looks nice.
We do custom software for companies with pretty high requirements regarding HA and toroughly test the hardware before putting it in place in one of our projects. Thus, the idea for having the 3200 was using it as testbed and maybe add a second array to do replication. - and later on use those boxes in the field instead of old netapps and lower spec MSA.

Regarding performance, I have an MSA2040 with 11x1.6tb on the same vsphere cluster - I didnt even bother doing benchmarks, depending on specific interface config you are stuck at wire speed... Thing is, I dont do benchmarks and do not like the idea of having to fiddle around to get the last few percent speed - those arrays work fine or not, its a pretty binary definition.

@Amy any news ?