Re: Trying to understand queue depth

BBloksma · ‎02-17-2012

Hi,

I’m trying to find out if our SAN should perform better or not.

We have 2 Management groups each with one cluster. We use the CMC version 9.5 and the nodes are currently at version 8.5 and 9.0. We will bring them all to the 9.5 version in a few weeks.

Looking at the performance monitor:
The cluster with 2 P4500 G2 4TB SAS ** nodes is performing well with:
- Average throughput of 10+ MB/s and peaks of 15+MB/s
- Average IOPS are 150+ and peaks are 500+
- The queue depth is most of the time somewhere around 1 or 2 with peaks of 20+
The cluster with 4 P4300 G2 6TB MDL SAS *** nodes is performing not so well, we think:
- Average throughput of 11+ MB/s and peaks of 45+MB/s
- Average IOPS are 200+ and peaks are 500+
- The queue depth is ALLWAYS 600.000+ (yes that is correct) with a slight (10 max) variation

We had an e-mail migration a few weeks ago and the P4500 cluster was the source for the data, the (then 2 node) P4300 cluster was the target. During the migration we would see a queue depth of 500.000+. In the past weeks we saw the queue depth only increasing.

Because we needed more space we added 2 more nodes and so we now have plenty of disk space and should have more performance. Restriping of all volumes was completed yesterday but the queue depth did not change.

So queue depth is probably not what I think it is, the number of items in the queue to be processed. ;-)

Before the mail migration we never looked at the performance monitor so we do not know what a normal value was before that weekend.

It seems there is something wrong with the queue depth at the P4300 cluster but what and why? How can I find out more? The SAN does not seem to perform at its peak so maybe there is nothing wrong but how do I know?

**
P4500 G2 has 12 500GB disks in each node, 420GB disks according to the CMC.
Disks are 2 sets of 6 disks in RAID5.
***
P4300 G2 has 8 1TB disks in each node, 1024GB according to the CMC.
Disks are 1 set of 8 disks in RAID5.

See attachment for performance screen, nice to see what happens at friday around 5 PM :-)

RonsDavis · ‎02-17-2012

Is it the 8.5 nodes that show the high queue?

It's actually a bug that is fixed in newer SAN/IQ versions. The number really can't be trusted.

BBloksma · ‎02-19-2012

Actually the high numbers are on the P4300 nodes which are the 9.0 software. But, we will be upgrading all nodes to the 9.5 version in a few weeks so I'll have look to those numbers again after that.

But just I understand, the queue depth SHOULD represent the number of SCSI commands in the queue right?

Emilo · ‎02-20-2012

Queue depth is the amount of outstanding I/O waiting for processing by the SAN. In other words, it’s the count of how many pieces of data are stacked up waiting to get written to or read from. the SAN.

4 P4300 G2 6TB MDL SAS ***

- The queue depth is ALLWAYS 600.000+ (yes that is correct) with a slight (10 max) variation

For SAS drives the queue depth ideally should be 2x the number of disks in the cluster. In you case this should be

4x8 = 32 x 2 =64

Why dont you run each node individually to see which (if any) node is showing the most I/O waiting for processing?

You could also run this for each volume.

If the numbers you are showing are correct. You have a problem with latency.

If queue depth is high, there are outstanding I/Os waiting to be serviced by the SAN. This increases IOPS, but adds latency because each I/O is waiting to be serviced instead of being serviced immediately. The SAN performs optimally when there are enough I/Os outstanding to keep the SAN busy, but not so many that each I/O has to wait longer than desired to get serviced

Queue depth is the amount of outstanding I/O waiting for processing by the SAN. In other words, it’s the count of how many pieces of data are stacked up waiting to get written to or read from. the SAN.

4 P4300 G2 6TB MDL SAS ***

- The queue depth is ALLWAYS 600.000+ (yes that is correct) with a slight (10 max) variation

For SAS drives the queue depth ideally should be 2x the number of disks in the cluster. In you case this should be

4x8 = 32 x 2 =64

Why dont you run each node individually to see which (if any) node is showing the most I/O waiting for processing?

You could also run this for each volume.

If the numbers you are showing are correct. You have a problem with latency.

If queue depth is high, there are outstanding I/Os waiting to be serviced by the SAN. This increases IOPS, but adds latency because each I/O is waiting to be serviced instead of being serviced immediately. The SAN performs optimally when there are enough I/Os outstanding to keep the SAN busy, but not so many that each I/O has to wait longer than desired to get serviced.

The bug the previous post had mentioned was actually with the 9.0 CMC a very early release.

M.Braak · ‎02-21-2012

I see this kind of behaviour also with saniq 9.5. Please dont always trust the performance counters because since they are implamented at version 8 there are a lot if bugs in them and still some of them aren't fixed yet. Counters also are not always what you think they are and dont even think to add them together! You can get very odd results then.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Trying to understand queue depth

Trying to understand queue depth