Storage Boards Cleanup
To make it easier to find information about HPE Storage products and solutions, we are doing spring cleaning. This includes consolidation of some older boards, and a simpler structure that more accurately reflects how people use HPE Storage.
HPE StoreVirtual Storage / LeftHand
cancel
Showing results for 
Search instead for 
Did you mean: 

Storevirtual 3200 Latency Issue

SVprodmgr
Advisor

Re: Storevirtual 3200 Latency Issue

Here is a quick update on the concerns.   The StoreVirtual engineering team is working on performance related issues posted here and those fed in through our Technical Support organization.  We recently released the StoreVirtual DSM for Microsoft and this is showing improvement for Windows hosts.    We are working a couple other items and are anticipating a software update shortely.  HPE Technical Support is planning a sweep through open cases starting next week to make customers aware of available fixes and status on pending items.    It is important that customer issues related to SV3200 performance have an open HPE Case Number.  This will insure they get a call during this and subsequent call sweeps.   We appreciate your support and patience.

 Amy Mitchell

HPE StoreVirtual Product Manager

 

I'm an HPE employee working in product management

Re: Storevirtual 3200 Latency Issue

In all due honesty - we are talking about an SSD tier which should have ~1msec latency; SV3200 is running 27msec and showing up to 665msec. It is not about performance, but rather the complete and utter lack thereof in combination with vmware.
Since the array we all bought is under standard warranty (or more) and does not work "as designed" in some decent fashion, every owner should indeed have an open trouble ticket. 


What´s bugging me is tickets are held back in level2 (hello, india) with some bogus claims of not having enough performance data or other. trying out RDM, VMFS, RDM+ATS, RDM+ATS+RRqueue, then VMFS+ATS+RRqueue, re-discussing if VMFS is really needed, ...
so, if the array was really working fine at hp labs, there should easily be some walkthrough, a list of tuneables and a performance sheet readily available. Instead, I have been doing benchmarks for three months and the only replies I get in some timely fashion are comments in this thread - at least they are entertaining, much more so than the phone support introducing you to yet another round of "let´s benchmark the world to be a better place". 

So, when is the software update due ? the same goes for the DSM (the array should give a decent performance without it in the first place, but...), which I heard of in November already. 

mtroper
Occasional Advisor

Re: Storevirtual 3200 Latency Issue

Hi Amy

Same issues here. Write latency is way to high. Support ticked 5319837770 is logged.

(all possible fixes were already tried (disable ATM, disable ACK, Path policies, ect)

 

Unfortunatly Ive get no answer from HP support.... (case is logged since 4 days. not a single reply)

If you tell people to log a support case, you should at least anser their cases with the actual status of investigation!

 

Re: Storevirtual 3200 Latency Issue

Dear Amy,

I had 3 hours of benchmarking yesterday (using a windows virtual machine with iometer), just to find out copying data to a vmfs LUN happens at around 9-12 MB/sec.
I can only repeat what I told the supporter: Put a proliant, an hp/10g switch, the hp vmware image and an SV3200 together and measure VMFS performance. If it works out at xxxK IOPs / low latency, we could compare the setup and try to figure out what may be different. If not, it would be the perfect starting point to find out what doesn't work out...

Since the cases are currently not put to L3 and just kept in the loop waiting for news, my hopes for the big change are pretty nil. case 5315737219 (6 months ago) yielded no results since the wrong benchmarks showed acceptable numbers and the current call chain 5319053973 is stuck somewhere between L2 and L3 waiting for me to provide yet another round of config data on how my servers are doing. Sorry, I want reference numbers and at least some starting point instead of being blocked waiting for Godot.

Re: Storevirtual 3200 Latency Issue

These are the kind of figures I'm seeing too. My cases 5318573642 and 5318573642 aren't going anywhere at the moment, and I'm actively persuing getting the unit returned and possibly exchanged for an alternative storage appliance. Currently we have spent over 6 months configuring this new setup and are nowhere. We are wasting a lot of time and money.

jbanger
Occasional Visitor

Re: Storevirtual 3200 Latency Issue

It is very telling that the SV3200 ISCSI model is not on the vmware HCL.

As a longtime lefthand user, and someone wanting to jump on the SV3200 platform  - Please sort this ASAP HP!

Re: Storevirtual 3200 Latency Issue

that HCL issue would explain a lot - but then, hp expressely wrote it in their specs etc. and insist on 6.0U2 etc...So no, it would not be valid to back out now.

Anyways, support is more active then ever - third level diagnosed the transport servers on my box are unevenly distributed - all LUNs were patched to one controller. That seems to be an option one cannot set by hand, so I had to reset and failover the controllers. Furthermore, I should not use or even set the VIP (that was on the first call, too) and also I shall not use locations. Just wondering how I would ever to 2-site-replication, but at the moment it seems that moment is pretty far away...

After cleaning out config an failover-/rebooting controllers, I got a bit more throughput and the following performance numbers.:

VMFS Test#7
read iops 12.616, latency 7,2/63,5 msec    
write iops 3.391, latency 23,3/205,3 msec

Reference RDM/NTFS    
read iops 13.900, latency 5/197 msec
write iops 6.900, latency 11/90 msec

So basically reading is still much faster with RDM ( I forgot to set it to nraid10 like the other vmfs volumes, which would be even faster) and writing is thrice (!)  as fast, as well as low in latency...
My MSA200 flash runs writes at 0.9 msec latency on average, 30 msec max - the SV3200, while running flash, will not even start with those numbers...

Re: Storevirtual 3200 Latency Issue

I've applied these settings to my vSphere 6.5 instance. Some were already set, e.g. the ATS fix (which is mentioned earlier in this thread), and the IOPS limit. Also, the "Maximum Outstanding Disk Requests" setting (Disk.SchedNumReqOutstanding) doesn't appear to exist for me.

Either way, after benchmarking a VM on all flash, I still see crippling write latency:

  • 100/0 R/W = 10k IOPS @ 0.8ms
  • 60/40 R/W = 46 IOPS @ 225ms
  • 0/100 R/W = 900 IOPS @ 11ms

* Tests used DiskSPD, 64k block size.

These numbers are an absolute joke, and I wonder if HPE is just clutching at straws at the moment, offering up 'fixes' that are mostly just iSCSI best practices. The fact of the matter is that this storage does not work properly, and this needs a solution ASAP. My SV3200 is part of a £75k virtual solution which is now 6 months old, and is still yet to see a production workload, as it is incapable of running them properly.

Highlighted
mtroper
Occasional Advisor

Re: Storevirtual 3200 Latency Issue

For me the same. Performance is still terrible.

Now HP L2 support wants do do some more IO performance testing ;-) (its a joke, that all customers with open cases regarding SV3200 performance make them)

The support should better concentrate their time to solve the issue on their lab setup.

 

Ive done some further testing and found that the following messages appear from time to time in the vmkernel.log:

2017-06-03T16:14:20.009Z cpu7:32812)NMP: nmp_ThrottleLogForDevice:2349: Cmd 0x2a (0x412e836d2ac0, 87486) to dev "naa.6000eb31f542be7a0000000000000fea" on path "vmhba33:C2:T0:L3" Failed: H:0x0 D:0x28 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:NONE

2017-06-03T16:14:20.244Z cpu1:32806)ScsiDeviceIO: 2325: Cmd(0x412e867e6c40) 0x28, CmdSN 0xab from world 87486 to dev "naa.6000eb31f542be7a0000000000000fea" failed H:0x0 D:0x28 P:0x0 Possible sense data: 0x0 0x0 0x0.

 

Seems related to:

VMK_SCSI_DEVICE_QUEUE_FULL (TASK SET FULL) = 0x28

This status is returned when the LUN prevents accepting SCSI commands from initiators due to lack of resources, namely the queue depth on the array.

Adaptive queue depth code was introduced into ESX 3.5 U4 (native in ESX 4.x) that adjusts the LUN queue depth in the VMkernel. If configured, this code will activate when device status TASK SET FULL (0x28) is return for failed commands and essentially throttles back the I/O until the array stops returning this status.

 

https://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1008113

 

 

 

Johannes_we
Advisor

Re: Storevirtual 3200 Latency Issue

I´d suggest you to just return the system and get a suitable replacement.
This thing is not worth tryting to optimize performance as far there is no major change in the internal design be it hardware or software.

mtroper
Occasional Advisor

Re: Storevirtual 3200 Latency Issue

Good suggestion ;-)

Now the question is, how to get HP to replace of the unit....?

The actual storage is already paid be our customer. It will not be easy to explain, why a brand new high performance Storage is a piece of crap.... (specially because we adviced him to buy the SV3200 instead of a MSA2040)

For me its clear, if HP cannot sort that issue out, we will switch to another manufactorer (not only on storage)

Its a shame, that HP is not able to get that device running fine after serval month...

@Amy: maybe its time for you to take some action! (instead writing that all customers with issues should file a support case, which lead only to bother paing cutomers to make useless performance meterings)

I will now contact our distributor to have a little talk on this issue.

regards

Marc

Re: Storevirtual 3200 Latency Issue

some short update:

I got some feedback, HPE is working on some fix (unknown ETA) and some time in summer, a new major version shall be available. Which is pretty vague, but at least something seems to be happening...

I tried my way around using full provisioned volumes and VMDKs (eager zeroed) - that seems to up the performance to more or less "normal" values. The downside being that 1, one loses a lot of space when compared to thin provisioning, 2, waiting for zeroing volumes before usage and 3, the performance dropping again when having concurrent access.

Unfortunately, I wouldn't know what chassis HP has in store with the same capabilities - so swapping seems a no-go, lest one thinks of a 3par...

Re: Storevirtual 3200 Latency Issue

Have you seen that the MSA line has been refreshed with the 2050/2052 recently?

https://www.hpe.com/uk/en/product-catalog/storage/disk-storage/pip.hpe-msa-2050-san-storage.1009949622.html

200k IOPS on flash, >5GBps sequential throughput, auto tiering, better scalability. The figures are WAY better than the StoreVirtual 3200.

We are considering this as a replacement, especially as we have a few MSAs here (P2000 G3s) that have performed flawlessly for 5+ years. I've lost faith in the StoreVirtual 3200 being able to handle my production workloads, both now and in the future.

Re: Storevirtual 3200 Latency Issue

Hi,
yes, I've seen it - looks nice.
We do custom software for companies with pretty high requirements regarding HA and toroughly test the hardware before putting it in place in one of our projects. Thus, the idea for having the 3200 was using it as testbed and maybe add a second array to do replication. - and later on use those boxes in the field instead of old netapps and lower spec MSA.

Regarding performance, I have an MSA2040 with 11x1.6tb on the same vsphere cluster - I didnt even bother doing benchmarks, depending on specific interface config you are stuck at wire speed... Thing is, I dont do benchmarks and do not like the idea of having to fiddle around to get the last few percent speed - those arrays work fine or not, its a pretty binary definition.

@Amy any news ?

SVprodmgr
Advisor

Re: Storevirtual 3200 Latency Issue

Thank you for staying in touch with the StoreVirtual Community.  I am working with the HPE Technical Support team to stay in closer contact with customers.  We are working on two patch sets and a maintenance release that we expect to address the issues reported here.  We are studying the release timelines now and will communicate them as soon as we finish.  Thank you for the continued patience. 

Amy Mitchell

HPE StoreVirtual Product Manager

I'm an HPE employee working in product management
richa3312
Occasional Advisor

Re: Storevirtual 3200 Latency Issue

Amy,

Can you help? I've been trying add our SV3200 to Warranties and contracts so we can download the DSM (it says entitlement is required) however it states that our serial number is already linked. We've never linked it so a bit confused. Is there anyway of finding out who has linked it or can't help can you point me in the right direction to someone that can?

Thanks,

Re: Storevirtual 3200 Latency Issue

If you don't mind me asking - do you mean the windows DSM or did I overlook some vmware news ...?

..i had a major outage yesterday due to the sv3200 not working - the data is on different arrays for the moment being, and the failed uplink to those caused around 350 VMs to crash... had it been on sv as planned, nothing big would have happened ^^

So if there were news, please do share...thanks
SVprodmgr
Advisor

Re: Storevirtual 3200 Latency Issue

I suggest reviewing the help section of the HPE support center about the linking of your warranty to your passport account.  Software support for SV3200 is provided through your warranty on the SV3200 serial number. 

If you need the DSM installer for SV3200, the easiest way is to get it from the upgrades menu on the StoreVirtual Management Console under the clients section.  We treat the DSM installer as a client from the StoreVirtual Management Console. 

Amy Mitchell

StoreVirtual Product Manager

I'm an HPE employee working in product management
richa3312
Occasional Advisor

Re: Storevirtual 3200 Latency Issue

 

Hi,

We run hyper-v so its the windows DSM i'm afraid.

Our unit suffered from some random issue whereby controller 2 missed a heartbeat and was subsequetly rebooted by controller 1. Apparently this is a known issue and will be fixed in an upcoming patch.

Following this volumes hosted by controller 2 suffered horrific latency > 25 seconds. It seems that  rebooting the host resolved this but support want the DSM installing anyway which I'm happy to do. 

Just can't link my serial for some reason!

Rich.

 

 

richa3312
Occasional Advisor

Re: Storevirtual 3200 Latency Issue

Thanks Amy,

 

I'll check that out. The site thinks the serial is already linked. Not sure why. Is there support for such things? It'd be nice to know who to contact. The help section mentions unlinking it but that would mean having access to the account which has linked it but we have no idea who that would be. The unit was brand new to us.

Regards,

 

Rich.

 

 

 

Re: Storevirtual 3200 Latency Issue

patch update:

there has been a patch 135-015-02, which showed promise in that it has some improvements for console, MPIO/DSM and memory leaks as well as load balancing related performance issuse.
I installed it straight away - seems the number do not really improve.

mtroper
Occasional Advisor

Re: Storevirtual 3200 Latency Issue

Installed 135-015-02 and 135-016-00.

After the install the SC2 became unavailable. (Reboot not possible anymore)

In addition Hosts loosed connection to the StoreVirtual (and one crashed)

Also now both controller logging exessive packet loss (1-5%) which is not seen on the switches. (and no flow control pause frames were logged on the switches anymore)

 

Nor3344
Occasional Advisor

Re: Storevirtual 3200 Latency Issue

I have been following this thread closely as I have experienced quite a lot of similar issues with our SV3200 multi-site solution.  The timing of your post scares me since I just have started installing the patches myself. Did you check both patches at the same time?  The paranoia in me restricted me to just choose 135-15-02 first and wait for it to complete the update on all four controllers and the FOM. All controllers and accessibility of the volumes against VMware behaved fine shortly after the patching was complete so I have started on 136-16-00.... Hoping I do not run into similar issues as you have seen. Could you please share what kind to setup you are running? I am running 2xSV3200 10Gb iSCSI multi-site. It is presenting storage to vSphere ESXi 6.0U3 hosts.

Please keep us updated on how it goes.

richa3312
Occasional Advisor

Re: Storevirtual 3200 Latency Issue

Hi,

When I installed 136-16-00 it didn't go on properly. We got the following error message:

The Update stopped due to an error. Check the versions of the storage controllers to determine if they were updated. Correct any issues and restart the update if nessesary.

At the same time it sent about 30 email messages to say that every temperature sensor under the sun was faulty followed by confirmation that everything was then ok!

When we checked the storage controller versions one controller had updated but the other hadn't.

We then re-tried the update at which point it did the same again. It seemed to fail at the end of the update waiting for the controller reboot. It now shows both controllers have version 136-16-00. 

We never lost contact with the storage throughout and it now seems to be working ok.

We run Hyper-V so not sure if it will be different on VMWare.

Rich.