1837365 Members
3673 Online
110116 Solutions
New Discussion

disk I/O

 
John Guster
Trusted Contributor

disk I/O

device %busy avque r+w/s blks/s avwait avserv
c4t0d4 28.60 47.49 49 1125 201.68 38.53

c4t0d4 17.84 46.76 28 545 456.07 39.78

this disk seems culprit for slowness. what is the next step to drill down to the root cause?
9 REPLIES 9
A. Clay Stephenson
Acclaimed Contributor

Re: disk I/O

Since this is sar, you don't have nearly the power you would have to diagnose as you would with Glance. You can still do it but it will be much more difficult. You should next gather more data letting sar collect snapshots every 5 minutes or so. You are trying to correlate processes with this disk activity. You may simply have a ton of processes waiting for this one device. Because I see d4, that strongly suggests that this is an array LUN. Host-based metrics are very misleading when connected to a something other than a conventional disk. First, is this LUN RAID5, RAID 1/0, ... ? Are you running a database?
Often when I see situations like this, the root cause is terrible SQL code -- but there are not enough data for that conclusion yet.

If I were you and didn't have Glance installed, I would install it (at least the 60-day Trial version). The time lost by the installation will be offset many times over by the granularity of the available data. Glance will be on any Applications CD set.
If it ain't broke, I can fix that.
Bill Hassell
Honored Contributor

Re: disk I/O

Unless this disk is a floppy, those numbers are extremely slow which indicates a disk failure. Look at syslog.log for error messages.


Bill Hassell, sysadmin
John Guster
Trusted Contributor

Re: disk I/O

data gathered from glance, sar(with d b u v options), vmstat, swapinfo, ipcs, iostat during the processing runing time. no error in syslog nor dmesg. CPU is 50% idle, 30% RAM free, buffer %rcache and %wcache close to 100, pseudo-swap space is 45% used, reserved is 400MB total, device swap is 12 GB, 10% used as total, no device swap;share memory area is clean no zero process attached share memory;2 processes are running, no blocked process, no page out in vmstat;disk is one meta clariion RAID5 with multi-paths under powerpath(the other path also show similar pattern in sar -d as above).system table indicates very low values for each kernal parameter.No database, just flat UNIX file. Within 14 seconds there are over 1200000 logical reads only 8 physical read towards I/O.(write is logical 4000/physical 24 within the same period of time) These 1200000 reads are claimed from buffer cache(HP says so). For such low physical read/write I/O, why the avwait time is so higher than avserv time shown sar -d? Any suggestions?
Steven E. Protter
Exalted Contributor

Re: disk I/O

Shalom,

Taking a shot in the dark here I'd say you have a write intensive application writing its data to some raid 5 disk.

Raid 5 writes the data in a lot of places on disk to maintain data integrity, making it a performance drag on writes.

You could use glance to identify the ugly write processes. Normally these are database processes and such. The cause comes from many areas, including bad programmer code, bad system configuration, lack of patches.

What does you i/o patch situation and general patch situation look like.

A little toy to make sar more manageable.

http://www.hpux.ws/system.perf.sh

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
John Guster
Trusted Contributor

Re: disk I/O

system is patched to the lastest. what system configuration should be scrutinized? programmer's code is the last thing SA wants to yell at.
Bill Hassell
Honored Contributor

Re: disk I/O

Well, the buffer cache figures tell the story. 100% read *and* write cache usage plus 1.2 million logical reads in 14 seconds is abut 85,000 reads/sec!! That means that the data is very small and all fits into the cache -- which means that the disk is unimportant. There is less than 1 physical I/O per second.

So if the application is "slow" then you'll need to double or triple the processor speed, or have the programmer rewrite the program to use parallel threads.


Bill Hassell, sysadmin
John Guster
Trusted Contributor

Re: disk I/O

close examine the sar -b data, it indicates whenever physical read/write starts, %wcache drops to 60% from 99%, what does this tell?

12:50:58 0 70787 100 1 262 100 0 0
12:51:03 0 73716 100 2 189 99 0 0

12:49:08 1136 83742 99 1141 3167 64 0 0
12:49:13 1025 85818 99 1382 3367 59 0 0
A. Clay Stephenson
Acclaimed Contributor

Re: disk I/O

Not all that much. If you consider that essentially all the i/o fits into a relatively small number of physical reads the missing only a few blocks of data in the cache can result in large changes in statistics when viewed over short periods of time. I was able to pretty much mimic your statistics by writing a very small c program which read tons of 2 byte records interspersed with a random write within the large file for every 1000 reads. I intentionally used low-level (read(), write(), lseek()) system calls rather than fread(), fwrite(), and fseek() to avoid the application buffering --- and the performance was terrible --- just as I expected.

As I alluded to earlier, I see the same thing in databases which have very high ratio's of logical to physical i/o -- and for similar reason. The application is re-accessing data which it already "knows" or should know.
If it ain't broke, I can fix that.
John Guster
Trusted Contributor

Re: disk I/O

The poor disk I/O average wait time is caused by the raid group shared with other systems. Thanks for everyone.