Array Performance and Data Protection
1752793 Members
6399 Online
108789 Solutions
New Discussion

Re: Performance question

 
SOLVED
Go to solution
tshipp118
Occasional Contributor

Performance question

Hello,

So over the past several months the performance of our "main" array has slowly changed. I am seeing more latency and a lot more issues. One issue is we hit a bug in the NOS code that is causing frequent service restarts, that is supposed to be resolved in 4.x. Also we moved all workloads to the Nimble about 4 months ago.

I come from a traditional storage background. So I guess my question is, should similar write workload reside on the same LUNs? Meaning, file server VMs, SQL, Exchange, etc? I wasn't totally sure is CASL and all that jazz would not care and order the writes in cache regardless of write block size or if it would help by organizing VMs in that traditional format?

Thoughts?

3 REPLIES 3
dbauder92
Valued Contributor
Solution

Re: Performance question

Tim;

My thoughts as a Nimble customer for the last 2 years.  This is what I've experienced and your mileage may vary, some settling of contents occurs during shipping . . .

When you create a volume, pick the performance policy for which that volume will be used.  This sets the applicable block size for that volume.  I've had to unlearn most of my traditional old school storage thumb rules.  Let CASL do it's "magic", it really works.  When I run into "performance issues" it is usually from the OS and not Nimble.  Just because I can create a monster volume and assign it to VMware isn't always a good thing.  When I get more than 30 VMs on a data store I'm seeing performance issues.  This has more to do with ESXi and how it handles simultaneous access then our Nimble hardware being able to handle the throughput.  The one caveat to this is when I do a storage vMotion of a large VM(s)  then the move time is limited by Nimble's I/O, as best as I can tell it isn't a network nor ESXi issue.  Happy with the throughput, understand where the limiting factor occurs.

Organize your volumes as best enables you to handle your operations / backup software requirements.  Utilize the best practices you had in place and you should be good to go. 

theta1242
New Member

Re: Performance question

If latency is becoming an issue, you might also want to make sure you're not enabling cache on volumes that are housing SQL logs or other log volumes that are doing nothing but sequential writes.  This piggybacks onto what Dan said above.  If you pick the correct performance policy, it should automatically disable the cache for log volumes (there are separate policies for Exchage/SQL data and Exchange/SQL logs).  That also brings up a design issue as to whether or not you've split out your SQL/Exchange logs into separate volumes or if you're running everything on one volume.  Latency is almost always going to be because of a saturated cache if storage is your bottleneck.  Have you looked at Infosight to get your reports to see what could be causing the issue on the array side?  There's a lot of useful analytics out there to help with this kind of thing.

jstinson105
Advisor

Re: Performance question

We're coming up on two years of Nimble usage and we're still seeing great latency, though it is very slowly creeping up. Dan and Justin are right about specifying your workload on each volume as much as possible, but we all know that the more loaded a storage system is, the higher its latency will be. The software wizardry of NimbleOS does a lot to mitigate this (going from NOS 2.3 to NOS 3.4 actually improved our latency system wide by about 10%). But even if your storage array was nothing but RAM, you would still see latency slowly creep up as you add more data and IO to it. 

The real question is not will latency increase, but are you seeing it increase more than it should? What is your typical latency for reads and writes? What is your cache hit rate? Is your latency higher than you want all day or only during specific workloads (nightly backups skew our average latency pretty hard but it doesn't affect anybody's productivity)? Have you looked at the stability of your network switches (retransmits will sink even the best storage system)?