Storage Boards Cleanup
To make it easier to find information about HPE Storage products and solutions, we are doing spring cleaning. This includes consolidation of some older boards, and a simpler structure that more accurately reflects how people use HPE Storage.
HPE StoreVirtual Storage / LeftHand
cancel
Showing results for 
Search instead for 
Did you mean: 

Trying to understand queue depth

BBloksma
Occasional Visitor

Trying to understand queue depth

Hi,

 

I’m trying to find out if our SAN should perform better or not.

 

We have 2 Management groups each with one cluster. We use the CMC version 9.5 and the nodes are currently at version 8.5 and 9.0. We will bring them all to the 9.5 version in a few weeks.

 

Looking at the performance monitor:
The cluster with 2 P4500 G2 4TB SAS ** nodes is performing well with:
- Average throughput of 10+ MB/s and peaks of 15+MB/s
- Average IOPS are 150+ and peaks are 500+
- The queue depth is most of the time somewhere around 1 or 2 with peaks of 20+
The cluster with 4 P4300 G2 6TB MDL SAS *** nodes is performing not so well, we think:
- Average throughput of 11+ MB/s and peaks of 45+MB/s
- Average IOPS are 200+ and peaks are 500+
- The queue depth is ALLWAYS 600.000+ (yes that is correct) with a slight (10 max) variation

 

We had an e-mail migration a few weeks ago and the P4500 cluster was the source for the data, the (then 2 node) P4300 cluster was the target. During the migration we would see a queue depth of 500.000+. In the past weeks we saw the queue depth only increasing.

 

Because we needed more space we added 2 more nodes and so we now have plenty of disk space and should have more performance. Restriping of all volumes was completed yesterday but the queue depth did not change.

 

So queue depth is probably not what I think it is, the number of items in the queue to be processed. ;-)

Before the mail migration we never looked at the performance monitor so we do not know what a normal value was before that weekend.

 

It seems there is something wrong with the queue depth at the P4300 cluster but what and why? How can I find out more? The SAN does not seem to perform at its peak so maybe there is nothing wrong but how do I know?

 

**
P4500 G2 has 12 500GB disks in each node, 420GB disks according to the CMC.
Disks are 2 sets of 6 disks in RAID5.
***
P4300 G2 has 8 1TB disks in each node, 1024GB according to the CMC.
Disks are 1 set of 8 disks in RAID5.

 

See attachment for performance screen, nice to see what happens at friday around 5 PM :-)

 

4 REPLIES
RonsDavis
Frequent Advisor

Re: Trying to understand queue depth

Is it the 8.5 nodes that show the high queue?

It's actually a bug that is fixed in newer SAN/IQ versions. The number really can't be trusted. 

 

BBloksma
Occasional Visitor

Re: Trying to understand queue depth

Actually the high numbers are on the P4300 nodes which are the 9.0 software. But, we will be upgrading all nodes to the 9.5 version in a few weeks so I'll have  look to those numbers again after that.

 

 But just I understand, the queue depth SHOULD represent the number of SCSI commands in the queue right?

Emilo
Trusted Contributor

Re: Trying to understand queue depth

Queue depth is the amount of outstanding I/O waiting for processing by the SAN.  In other words, it’s the count of how many pieces of data are stacked up waiting to get written to or read from. the SAN.

 

4 P4300 G2 6TB MDL SAS ***


- The queue depth is ALLWAYS 600.000+ (yes that is correct) with a slight (10 max) variation

 

For SAS drives the queue depth ideally should be 2x the number of disks in the cluster. In you case this should be

4x8 = 32 x 2 =64

 

Why dont you run each node individually to see which (if any) node is showing the most I/O waiting for processing?

You could also run this for each volume.

 

If the numbers you are showing are correct. You have a problem with latency.

  If queue depth is high, there are outstanding I/Os waiting to be serviced by the SAN.  This increases IOPS, but adds latency because each I/O is waiting to be serviced instead of being serviced immediately.  The SAN performs optimally when there are enough I/Os outstanding to keep the SAN busy, but not so many that each I/O has to wait longer than desired to get serviced

Queue depth is the amount of outstanding I/O waiting for processing by the SAN.  In other words, it’s the count of how many pieces of data are stacked up waiting to get written to or read from. the SAN.

 

4 P4300 G2 6TB MDL SAS ***


- The queue depth is ALLWAYS 600.000+ (yes that is correct) with a slight (10 max) variation

 

For SAS drives the queue depth ideally should be 2x the number of disks in the cluster. In you case this should be

4x8 = 32 x 2 =64

 

Why dont you run each node individually to see which (if any) node is showing the most I/O waiting for processing?

You could also run this for each volume.

 

If the numbers you are showing are correct. You have a problem with latency.

  If queue depth is high, there are outstanding I/Os waiting to be serviced by the SAN.  This increases IOPS, but adds latency because each I/O is waiting to be serviced instead of being serviced immediately.  The SAN performs optimally when there are enough I/Os outstanding to keep the SAN busy, but not so many that each I/O has to wait longer than desired to get serviced.

 

The bug the previous post had mentioned was actually with the 9.0 CMC a very early release.

 

M.Braak
Frequent Advisor

Re: Trying to understand queue depth

I see this kind of behaviour also with saniq 9.5. Please dont always trust the performance counters because since they are implamented at version 8 there are a lot if bugs in them and still some of them aren't fixed yet. Counters also are not always what you think they are and dont even think to add them together! You can get very odd results then.