HPE EVA Storage

Re: High disk i/o wait time, but pretty good i/o service time?

 
Adam Garsha
Valued Contributor

High disk i/o wait time, but pretty good i/o service time?

rhel 5.3 host accessing sas disk from and msa. collectl shows me high i/o wait time for one of my disks (on the order of minutes), but service time doesn't look bad. I am trying to wrap my mind around what this means. Overwhelming BW? (I've never seen that before).

Here is a screen shot of the type of thing I see at times:

# CPU[HYPER] SUMMARY (INTR, CTXSW & PROC /sec)
# User Nice Sys Wait IRQ Soft Steal Idle Intr Ctxsw Proc RunQ Run Avg1 Avg5 Avg15
0 0 1 11 0 3 0 82 23K 49K 1 823 0 8.47 5.14 3.1405
# DISK STATISTICS (/sec)
# <---------reads---------><---------writes---------><--------averages--------> Pct
#Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util
cciss/c0d0 36 0 9 4 1601 28 372 4 4 0 2 0 8
sda 0 0 0 0 0 0 0 0 0 0 0 0 0
sdb 0 0 0 0 0 0 0 0 0 0 0 0 0
sdc 2 0 0 4 62564 16090 139 448 446 119 886 6 88
sdd 0 0 0 0 0 0 0 0 0 0 0 0 0
sde 0 0 0 0 0 0 0 0 0 0 0 0 0
sdf 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-0 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-1 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-2 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-3 0 0 0 0 0 0 0 0 0 0 0 0 0

Notice the high i/o wait, but svc time doesn't look so bad. Writes look really big. Help me understand what this describes (best guess). Usually, I see high i/o wait and high svc time going hand-in-hand.
7 REPLIES 7
MarkSeger
Frequent Advisor

Re: High disk i/o wait time, but pretty good i/o service time?

The wait time is in msec so the value fot sdc looks to be about 0.886 seconds, which is still on the high side.
-mark
Adam Garsha
Valued Contributor

Re: High disk i/o wait time, but pretty good i/o service time?

Yes, I am saying that wait time is high (very high if you ask me). But look at SvcTime... only 6ms. My question is would this indicate perhaps a BW issue (vs. an iops issue?)
MarkSeger
Frequent Advisor

Re: High disk i/o wait time, but pretty good i/o service time?

I think trying to draw conclusions by just looking at disk times is a mistake. I'd look at everything! Maybe let collectl run at a sampling interval for 1/2 hour or when you're sure you are seeing spikes in the wait times and then plot everything with colplot. Could be almost anything. You can always play back the data as well and look at the raw numbers as well.
-mark
Adam Garsha
Valued Contributor

Re: High disk i/o wait time, but pretty good i/o service time?

yes of course I've plotted. Above is just a snippet. Assume the trend above goes on for minutes on end (10-15min).
Adam Garsha
Valued Contributor

Re: High disk i/o wait time, but pretty good i/o service time?

Maybe my question could be better focused as:

Can you list some scenarios (on rhel 5 linux) whereby you'd see high average io wait time, but not high average io service time?

That is what I seek to understand, it seems a little paradoxical to me, but I think it means stuff is queing up at the OS but I am not sure I fully understand the distinction between "wait time" and "service time" as derived by collectl.

From the details page they are defined as:

Wait == Average time in msec for a request has been waiting in the queue
SvcTim == Average time in msec for a request to be serviced by the device

chris huys_4
Honored Contributor

Re: High disk i/o wait time, but pretty good i/o service time?

Hi Adam,

Easy.

The msa can process much more IO in parallel, then the hosts fc hba is giving to him.

I.e. the" max_q_depth parameter" of the hosts fc hba, as it would be called on hp-ux 11.31, and I dont know the equivalent parameter name on redhat linux ;), is to small, compared to the "max_q_depth" of the msa.

Increase, "double up", the "max_q_depth" of the fc hba of the host, and the msa will need to work harder, which in turn, should increase the amount of IO/sec, with probably slightly increased service time, but certainly greatly reduced waittime and queuelength.

> Usually, I see high i/o wait and high svc
> time going hand-in-hand.
The analogy is "roadworks" at a highway. Due to "roadworks", 2 of the 4 lanes of a highway are closed down for 1 or 2 kilometer. Where do you see congestion building up ? Where the highway is reduced to 2 lanes instead of 4. Where do you see no congestion at all ? After the roadworks were you have access to all 4 lanes again. Thats what you see now. The max_q_depth of the fc hba of the host, in normal operation, operates only on 2 lanes, the max_q_depth of the msa is 4 lanes. Congestion arises at the fc hba, increasing the max_q_depth of the hosts fc hba, will make the msa work at its full potential, for that 1 host...

Greetz,
Chris
Adam Garsha
Valued Contributor

Re: High disk i/o wait time, but pretty good i/o service time?

This is good info. We have SAS based connection. The SAS HBA's are "HP SCo8Ge" (488765-B21). Are there queue depth settings for these as well? I'll have to google around.