1831345 Members
3098 Online
110024 Solutions
New Discussion

Disk I/O Utlisation

 
SOLVED
Go to solution
John Jayaseelan
Super Advisor

Disk I/O Utlisation

Hi,

Attached glance report shows Disk utilization high (100%) and there some Alarms log generated with Yellow & Red Warning.

Could any one please tell how importance this warning is for the system performance?

Thanks
John Jayaseelan


11 REPLIES 11
John Jayaseelan
Super Advisor

Re: Disk I/O Utlisation

Hi,

But average Disk Utilization seems to be lesser.

Thanks
doug mielke
Respected Contributor

Re: Disk I/O Utlisation

I've raised this question before for my systems, that after attaching high speed SAN storage, Glance reported at or near 100%.

Sar -d however, shows great access times and less busy.

The answer seems to be that Glance may use some metrics that are outdated in the world of fast storage.

I'd check the sar -d numbers. Also look at sar -b while you're at it, and check the cache hit rates.

On our systems, old direct attached storage has 10-30 ms access times, the san is in single digits. Hit rates for us are always ove 50%, and ad high as 95% on reads in some systems.
Hazem Mahmoud_3
Respected Contributor

Re: Disk I/O Utlisation

The average of 83% is still somewhat high. Do you know what process is causing this? The average on my system at work is usually in the 30% range. The only time it reached as high as what you are seeing is in an Eloquence database update that was running in synchronous mode. We ended up killing the process and running it in asynchronous mode and brought the disk i/o back down to the 40% range.
Jean-Luc Oudart
Honored Contributor

Re: Disk I/O Utlisation

The alarms are set by default in /var/opt/perf/adviser.syntax
(man glance)
You can create your own config file.

We also have disk utilisation peak 100% from time to time, It's all depend on the processing.
I'd say you should have a baseline (for waht is acceptable cpu/mem/...) and use it when you have performance pb on the box.

Regards,
Jean-Luc
fiat lux
John Jayaseelan
Super Advisor

Re: Disk I/O Utlisation

Hi,

Most of the time the top disk user is 'vxfsd' process.

Thanks
Hazem Mahmoud_3
Respected Contributor

Re: Disk I/O Utlisation

You might want to check if you have the latest patches installed for vxfsd. One that I found is PHKL_28024. A short description:
Defect Description:
PHKL_28024:
( SR:8606274264 CR:JAGae38341 )
The vxfsd kernel daemon is creating kernel threads that
are able to run for long periods of time.

Resolution:
Changes were made to the vxfsd code to limit the number
of kernel threads doing file syncs.


This may or may not resolve it, but in any case, check to see if you have the latest vxfsd patches installed. Hope this helps.

-Hazem
Stuart Abramson_2
Honored Contributor

Re: Disk I/O Utlisation

As mentioned above, glance and sar -d use different measurement techniques.

Glance measures instantaneus SINGLE BUSIEST DISK in the last 20 minutes or so (that's how long back that he looks - I forget exactly).

sar -d measures EVERY DISK individually and averages over the period that you set the sar collection daemon to run.

So glance can show 100% busy disk for 20 minutes becase disk cTtXdY was 100% busy for an instant (some small measurement unit that he uses), then cLtNdQ was busy 100% for some time, then cZtYdG, etc.

sar -d will show you that each individual disk was busy at some % over a 10 minute interval (we use 10 minute "slices"), and it will usually be less than 100% because NO disk can be 100% busy for 10 straight minutes.

I like sar -d for disk studies. It shows you more metrics: disk busy %, queue depth, avserv, avwait, etc. BTW, avserv is the most important. "How long did it take on average to complete a disk request.."
Stuart Abramson_2
Honored Contributor
Solution

Re: Disk I/O Utlisation

Okay, here is how you measure your disk performance for "sar -d". Look for these things, as indicators of problems:


a. % busy greater than 50%

b. avque greater than 3

c. avwait greater than avserv.

- This means the i/os are waiting longer than the
the time it takes to process them. Bad...

d. However:

- %busy high and queue length low, is okay, because
it means the disk is working, but is keeping up.

e. avserv < 6 ms is good!

If a disk rotates at 10,000 revolutions per minutes, then
how long does one revolution take:

10,000 revolutions
10,000 RPM = ------------------
60 seconds

60 seconds
1 revolution = --------------
10,000

= .006 seconds

= 6 milliseconds

f. Average seek time would be 1/2 rotation time. Also
called "latency time".

g. Average transfer rate for one block of data on one
cylinder would then be 6 milliseconds.


John Jayaseelan
Super Advisor

Re: Disk I/O Utlisation

Stuart,

Following the result for sar -d 600 10. You tol avserv < 6ms is good. Do you mean there is a potential issue of I/O bottlenecks.

11:18:20 device %busy avque r+w/s blks/s avwait avserv
11:28:20 c2t5d0 7.59 0.50 7 63 4.35 15.27
c1t11d0 0.79 0.50 1 5 4.95 10.91
c2t3d0 0.01 0.50 0 0 4.52 13.00
c7t14d0 4.53 0.50 5 54 4.49 10.66
c0t0d0 0.22 0.50 0 6 5.35 12.68
c0t1d0 1.84 0.50 2 7 5.00 10.16
c0t3d0 0.09 0.50 0 2 0.92 12.57
c0t4d0 0.83 0.50 1 3 4.92 11.93
c0t5d0 1.81 0.50 2 15 4.94 7.83
c1t0d0 0.09 0.50 0 2 0.96 13.32
c1t1d0 0.13 0.50 0 3 3.32 11.67
c1t5d0 0.58 0.50 1 2 4.86 12.95
c1t4d0 0.92 0.50 1 3 5.21 16.67
c1t3d0 0.22 0.50 0 7 4.44 13.48
c1t2d0 0.36 0.50 0 2 5.10 11.20
c0t2d0 2.53 0.50 2 10 5.11 13.28
c0t12d0 0.18 0.50 0 2 2.92 15.23
c0t13d0 25.49 0.51 37 589 5.03 7.30
c1t13d0 1.95 0.60 2 36 5.21 14.30
c2t4d0 0.19 0.50 0 2 2.89 13.18
c0t14d0 0.49 0.50 1 6 4.10 14.34
c0t15d0 0.14 0.50 0 2 4.49 18.65
c1t14d0 0.36 0.50 0 5 4.13 12.34
c1t15d0 0.12 0.50 0 1 4.48 17.27


Thanks
Stuart Abramson_2
Honored Contributor

Re: Disk I/O Utlisation

11:18:20device..%busy...avque...r+w/s...blks/s..avwait..avserv
11:28:20c2t5d0..7.59....0.50....7.......63......4.35....15.27
........c1t11d0.0.79....0.50....1.......5.......4.95....10.91
........c2t3d0..0.01....0.50....0.......0.......4.52....13.00
........c7t14d0.4.53....0.50....5.......54......4.49....10.66
........c0t0d0..0.22....0.50....0.......6.......5.35....12.68
........c0t1d0..1.84....0.50....2.......7.......5.00....10.16
........c0t3d0..0.09....0.50....0.......2.......0.92....12.57
........c0t4d0..0.83....0.50....1.......3.......4.92....11.93
........c0t5d0..1.81....0.50....2.......15......4.94....7.83
........c1t0d0..0.09....0.50....0.......2.......0.96....13.32
........c1t1d0..0.13....0.50....0.......3.......3.32....11.67
........c1t5d0..0.58....0.50....1.......2.......4.86....12.95
........c1t4d0..0.92....0.50....1.......3.......5.21....16.67
........c1t3d0..0.22....0.50....0.......7.......4.44....13.48
........c1t2d0..0.36....0.50....0.......2.......5.10....11.20
........c0t2d0..2.53....0.50....2.......10......5.11....13.28
........c0t12d0.0.18....0.50....0.......2.......2.92....15.23
........c0t13d0.25.49...0.51....37......589.....5.03....7.30
........c1t13d0.1.95....0.60....2.......36......5.21....14.30
........c2t4d0..0.19....0.50....0.......2.......2.89....13.18
........c0t14d0.0.49....0.50....1.......6.......4.10....14.34
........c0t15d0.0.14....0.50....0.......2.......4.49....18.65

For this timestamp that you list:

1. % busy: None of your disks is busier than 25%. So that's not busy.
2. avque is 0.50 for all disks. So that means that the disks are not
backing up.
3. R+W/s is low. So, again, you're not busy.
4. blks/s, likewise, is low. (This is NOT sufficient load to measure
your disks!)
5. avwait is low, given rotation speeds in the 10,000 rpm range.
6. Your avser is higher than 6 ms. I forget what kind of disks you have,
but this probably means that you just have slow disks (compared to
10,000 rpm). One time that avserv does drop below 10 ms, is when you do
have some activity on c0t13d0. You probably got some sequential
streaming going.

In short, this timestamp isn't strenuous enough to tell you much.
John Jayaseelan
Super Advisor

Re: Disk I/O Utlisation

Stuart,

Following is the average taken for more than 1 hour. The point 'sequential activity' is what my opnion also. This disks are connected through 2 SCSI FW controller. It looks like the bottleneck is the traffic through the controller not actual disks.

dd on a 2GB disk took 270 seconds which is supposed to be 100 seconds on 20MB/Sec FW SCSI controller. Is there soln?

# sar -d 600 10

HP-UX ccprod01 B.11.00 U 9000/898 12/05/03

device %busy avque r+w/s blks/s avwait avserv
Average c2t5d0 8.49 0.55 8 72 4.83 15.82
Average c1t11d0 0.65 0.50 1 4 5.00 10.87
Average c2t3d0 0.02 0.50 0 0 5.53 13.07
Average c7t14d0 5.22 0.55 6 61 4.89 11.44
Average c0t0d0 0.23 0.50 0 5 4.91 10.47
Average c0t1d0 0.99 0.50 2 18 5.06 4.79
Average c0t3d0 0.10 0.50 0 2 1.77 12.52
Average c0t4d0 0.62 0.50 1 9 5.03 8.51
Average c0t5d0 1.81 0.50 6 59 5.08 3.11
Average c1t0d0 0.08 0.50 0 1 1.50 12.76
Average c1t1d0 0.13 0.50 0 3 3.32 13.29
Average c1t5d0 0.36 0.50 0 2 4.95 13.30
Average c1t4d0 0.45 0.50 0 2 4.95 15.05
Average c1t3d0 0.21 0.50 0 6 4.02 13.40
Average c1t2d0 0.19 0.50 0 1 5.08 10.22
Average c0t2d0 1.81 0.50 3 22 5.03 7.87
Average c0t12d0 0.22 0.50 0 2 4.10 13.35
Average c0t13d0 8.81 0.63 12 179 5.65 9.49
Average c1t13d0 1.75 1.20 2 30 8.69 17.32
Average c2t4d0 0.25 0.50 0 3 3.83 11.38
Average c0t14d0 0.79 0.82 1 8 7.15 19.78
Average c0t15d0 0.19 0.50 0 2 4.51 19.31
Average c1t14d0 0.51 0.87 1 6 6.78 17.43
Average c1t15d0 0.13 0.50 0 2 4.41 16.75

Thanks