StoreVirtual Storage
1753890 Members
7409 Online
108809 Solutions
New Discussion

Re: Using Performance Monitor to check performance

 
M. N.
Advisor

Using Performance Monitor to check performance

Imagine a VMware vSphere infrastructure based on 3 ESXi hosts connected via 10Gbit iSCSI connection to 2 HPE StoreVirtual P4xxx clusters.

Using Performance monitor I see an average “Throughput value” ranging between 4,000,000 B/s and 8,000,000 B/s between each ESXi host and each StoreVirtual cluster.

How can I detect if the value is acceptable?

Even more, how can I detect if other values fall in an acceptable range or if there is any performance issue?

Regards

M.N.

8 REPLIES 8
Tom Lyczko
Super Advisor

Re: Using Performance Monitor to check performance

I would like to know this too....explanations of how to interpret performance data as we see the graphs, and so forth.

M. N.
Advisor

Re: Using Performance Monitor to check performance

Is there any documentation to start from?

Regards

marius

 

 

Stor_Mort
HPE Pro

Re: Using Performance Monitor to check performance

Here is a good writeup about the CMC performance monitor.

http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c04863498

I am an HPE employee - HPE SImpliVity Support

Accept or Kudo

M. N.
Advisor

Re: Using Performance Monitor to check performance

Thank you, it's an excellent document (I alreadey read it) but it does not provide any reference value.

Let me make an stupid example: if I report to my doctor a blood pressure of 120 I want to know it it is low, high or normal, if I am at risk or if I shoult take any action.

The same are values I get from CMC or any other performance monitor: how can I say to the customer if they are high, low, normal, if they should take any action or ba aware of any risk?

Regards

M.N.

Stor_Mort
HPE Pro

Re: Using Performance Monitor to check performance

When looking at storage performance, keep in mind that the storage system is a responder. It does not make any transactions happen by itself. The storage system will only provide throughput and IOPS to the extent demanded by the initiators. That's fine if you are determining a baseline production load to better understand what's normal. If you are trying to determine the extent of peak performance, you need to use load generators to push the storage to the limit. Do not do this during production.

As you add load to a system, bottlenecks that restrict performance begin to become apparent. Here are a few to evaluate:

  • disk access time, which is usually related to rotational speed
  • disk raid configuration
  • bus speed
  • cache size
  • cache r/w ratio
  • network speed and bonding configuration
  • network raid configuration
  • host initiator multipath configuration

This is too big of a subject to go into here - there are many good blog articles, and Wikipedia of course. But let me give you a few rules of thumb for detecting when the load is getting too high for the storage system to handle. Look at the Average column in the performance monitor to get the most consistent picture and one that best correlates with user experience. There are no hard limits. You should see performance gradually degrade with heavier load in a properly working system.

Latency - allow up to 50 ms for 7200 rpm disk drives, 30 ms for 15K rpm enterprise SAS drives, 10 ms for SSD. Anything less than that is great. If readings are consistently over that, you may be putting more load on the system than it can handle.

Queue depth - For the 10K and 15K drives, anything above roughly twice the number of disks in the cluster is considered a heavy load. (That corresponds to 2 active IO per disk.) Due to the design of 7200 rpm disks, those should only have one active IO, i.e. queue depth should not be higher than roughly the number of disks in the cluster. A high queue depth should correlate with readings at the high end of expected latency.

Throughput and IOPS - these depend on how much load is being applied to the storage system. Throughput and IOPS are inversely related by the transaction size. Large transactions (greater than 32KB, for instance) are best represented by throughput. Small transactions (smaller than 8KB, for instance) should be represented by IOPS.  Both throughput and IOPS have little meaning in a relative performance context without knowing the average transaction size.

 

I am an HPE employee - HPE SImpliVity Support

Accept or Kudo

Tom Lyczko
Super Advisor

Re: Using Performance Monitor to check performance

Hello @StorMorty, that was a very helpful reply.

I was going to comment that what M.N. et al wanted was how to understand and interpret the statistical results.

I think my particular case it's easiest to focus on disk latency and queue depth, it's a little easier to explain those to people.

Regarding your latency and queue depth articles, I'll assume till you say otherwise these refer to the relative averages, even though one should look at how often either one peaks and what the peaks are.

IOPs is good to look at but obviously interpreting that is case-by-case which is why I asked in another thread about how to calculate IOPs for an HP VSA...I'll respond in that other thread. I also don't know at ALL how to figure out average transaction size. :)

What  plan to do is if both latency and queue are mostly on the "high" end for one's array and IOPs are often close to or above the theoretical IOPs calculations I'm able to do...then it's becomes clearer what to do.

Thank you, Tom

Stor_Mort
HPE Pro

Re: Using Performance Monitor to check performance

Hi Tom, 

Averages are well correlated with user experience. A momentary spike in latency rarely bothers anyone, but as you point out, it's good to be aware of those characteristics as they can help form a more complete story about what's going on. For instance, if you have regular spikes every 15 minutes, it could indicate a regularly scheduled maintenance job which might not need to run as often during production hours.

The average transaction size is recorded and displayed by the CMC performance monitor along with the other metrics. It also helps tell a more complete story. If you have a period of time where the transaction size has become fairly large, it may be due to a backup process running. The IOPS metric will often be depressed during that time as larger transactions dominate and push up the throughput measurement. Conversely, if you have a period where most activity is 4KB size SQL transaction based, your IOPS will be high and your throughput will be fairly low. Because the characteristics of the data have such an effect on IOPS and throughput, the theoretical peak IOPS of an array is usually just theoretically interesting - it is never encountered in production.

 

I am an HPE employee - HPE SImpliVity Support

Accept or Kudo

Tom Lyczko
Super Advisor

Re: Using Performance Monitor to check performance

Thank you for the additional explanation(s).

They are very good and they make my head want to explode. :) :)