StoreVirtual Storage
1753523 Members
7742 Online
108795 Solutions
New Discussion

Storevirtual 3200 Latency Issue

 
GuillaumeRainer
Advisor

Re: Storevirtual 3200 Latency Issue

be my guest - the initial performance call was 5315737219 and resulted in further testing and firmware update 13.0x -> 13.1x; the current call chain 5319053973 started with upgrading to 13.5x, re-initializing the array and retrying to set it up in a proper, performing manner. 

it may seem strange to put it that way, but it´s not about feeling the performance were not right - in fact, it just is not. there is no logical explanation for a factor 10 between native filesystems/iscsi and vmfs; having virtual machine vMotion with an average of 16 mb/sec is just plain sad. and - as I stated initially - the array itself would be fast enough, it´s not like I was hoping for performance figures in the all-flash-high-end arena...

as a sidenote, our array is called left and thus the controllers are (automatically) named left-SC1 and left-SC2. there is a bug in the initial assistant; when using the DNS names for the interface (e.g. https://left-sc1...) , it will start initializing and renaming the controllers and then crash with absurd error messages ( like e.g. "HAL Error") while creating the raid groups. 

GuillaumeRainer
Advisor

Re: Storevirtual 3200 Latency Issue

hi all,

did a new set of performance tests with the new firmware (patch release on weekend, version 135-010-00, 8.5.2017).
The numbers got better, just for comparison:

initial storevirtualOS 13.5:
vmware server with luns (vmfs), VM on top -> 20 mb/sec
vmware server, VM on some storage, iscsi from within VM -> 200 mb/sec
vmware server, VM on some storage, RDM pass-through -> 190 mb/sec

storevirtualOS 13.5.00.0791.0
vmware server6.0 with luns (vmfs), VM on top -> 35-55 mb/sec
vmware server6.5 with luns (vmfs), VM on top -> 127 mb/sec
vmware server6.0, VM on some storage, RDM pass-through -> 93.9 mb/sec

All those tests imply sequential writes to disk, no cache (just dd, 20g size) and should demonstrate bandwith. I feel there still is some major issue with handling the esxi iscsi stack and VMFS, but finding out what exactly is pretty much out of my league. I'll keep you posted.

richa3312
Advisor

Re: Storevirtual 3200 Latency Issue

Thanks for pointing out the update.

I've installed it last night so will monitor it over the next few days and see if anything changes. 

Re: Storevirtual 3200 Latency Issue

Hi all,

We have exactly the same problem as described in this thread, very high latency 30-40ms, low IOPS 500-1000, and this on a system with 21 SSD's and 44 10K disks - ESXi 6.0 U3 with 10 Gbit/S  and 5900 series switches.

The performance in general is not great, but when measuring only with with reads - the latency drops 10x and the iops go up 5x 
We have a similar system at another client running Hyper-v 2016 - here the performance is mind blowing with 40K Iops at 0.2 MS latency - so the problem seems to be with the ESXi implementation.

The case have been given to the support team, and I will post any findings.

Thomas Nielsen
Team lead and infrastructure architect
GuillaumeRainer
Advisor

Re: Storevirtual 3200 Latency Issue

Hi,

another batch of tests coming through... ^^
I did some lowlevel linux testing, the measurements are not "valid" since stuff happens way too fast, but it gives some idea:

SV3200@VMFS:              read ~42K IOPS / 0,003 msec latency
SV3200@VMFS:              write 195 IOPS / ~5 msec latency
SASraid on physical host: read 70K IOPS / 0,001 msec latency
SASraid on physical host: write 3.349 IOPS / 0,29 msec latency

This was on XFS filesystems (the virtual machine was set up using the same kickstart as the physical one); one may argue the caches etc. matter here but the write IOPS on SV3200 give it away something is pretty wrong here.

I also did some new tests with windows/IOMETER using RDM (raw device mappings, 4 pass-through disks), they run as follows:

Windows Raw device mapping, NTFS volumes:
 nRAID10 4k 100% READ 74MB/s and 18.259 IOPS (average latency 1.8)
 nRAID10 4k 100% WRITE 24MB/s and 5.385 IOPS (average latency 6)

Same windows machine, having the bootdisk on VMFS:
nRAID10 4k 100% READ 27MB/s and 6.509 IOPS (average latency 19)
nRAID10 4k 100% WRITE 8,5MB/s and 2.000 IOPS (average latency 24)
 
Cheers
Guillaume

referencepoint
Advisor

Re: Storevirtual 3200 Latency Issue

My performance is particularly bad on ESX, but it seems that writes are the main factor. I also have a physical Windows server connected to the array, and like @GuillaumeRainer above I see my Windows iSCSI random SSD IOPS drop from 15,000 @ 2ms to 190  @ 169ms.

richa3312
Advisor

Re: Storevirtual 3200 Latency Issue

I notice you have SSD's, SAS and NL SAS in your box. I didn't think the SV3200 supported three tiers only two. Could this have something to do with it? My understanding is that the three drive types you have would all go in different tiers. 

 

referencepoint
Advisor

Re: Storevirtual 3200 Latency Issue

The initial config wizard puts my drives in 3 tiers, 0 (ssd), 1 (sas) and 2 (nl-sas/sata), and documentation backs this up.

I've done my performance testing with RAID only created on the tier I want to test, as there is no way to pin a volume to a certain tier (or if there is, I haven't found it!).

Re: Storevirtual 3200 Latency Issue

Hi everyone,

While support is working on the case, this mail came from HPE today:

http://h20566.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-a00009136en_us&hprpt_id=HPGL_ALERTS_1967118&jumpid=em_alerts_us-us_May17_xbu_all_all_1017565_1967118_StorageOptions_critical__/

I have not tested the solution yet, but will do this asap.

Hope this helps some of you.

Best regards

Thomas

Thomas Nielsen
Team lead and infrastructure architect
GuillaumeRainer
Advisor

Re: Storevirtual 3200 Latency Issue

Hi all,

Thomas, thanks for sharing - I did a number of tests this morning (just before getting news from Support), with mixed feelings. Basically, since running tests and being unable to use this array in a somewhat productive way, I just have a single node attached to a given volume.
So, running my usual benchmarks (e.g. dumping 20g straight in, reading 4k blocks from the dump) did not really change - maybe the figures would be something else when vmware clustering.

Support, in contrast, urged to try out the said ats heartbeat fix; at the same time, they wanted another round of good old windows raw device mapping iometer tests (which I feel have been done more than enough).

Now, since HP is reading this anyways: Thing is, I do not care anymore if windows or BSD or plan9 got some iops numbers when running infiniband over USB-1 or firewire; that array has been put in place to provide storage for ESXi and it seems to be buggy in this exact combination.
No problem, HPE - fess up, build your own test rig, check it, fix it or advise or show how it´s done, and be done with it. But don´t have me running for half a year with a 60k€ array doing benchmarks and console me with a "Customers need to do performance tuning with help of any performance consultant at their end, based on requirement".