1820619 Members
1956 Online
109626 Solutions
New Discussion юеВ

High Wait IO

 
elimeli
New Member

High Wait IO

I have an oracle database on an HPUX superdome with 12 CPUS.

I see a lot of Wait io on the CPU (30-40%) but no disk bottlenecks (by sar -d, vmstat, glance etc..)

how can I drill down into the root cause of the wait io and find out what is by bottleneck.

9 REPLIES 9
Andrew Young_2
Honored Contributor

Re: High Wait IO

Hi.

When you were using glance were you able to look at which processes were causing the waits? I am assuming its the Oracle processes?

If so what is the state of the SGA? We have had problems with Oracle with fragmented SGA's and other shared memory problems.

HTH

Andrew Y
Si hoc legere scis, nimis eruditionis habes
elimeli
New Member

Re: High Wait IO

Hi.
Thanks for the response.
What do you mean by the "state of teh sga" ?
It is clearly not an issue of fregmented shared pool... do you mean buffer cache ?
if so - what I can check in order to verify that ?

thanks again.
Eli
Andrew Young_2
Honored Contributor

Re: High Wait IO

Hi Eli

The SGA is Oracle's Shared Global Area. When this is too small or badly fragmented you can get similar performance issues to what you are seeing. Also is your SGA locked into memory or is it swapping out. An indication of too much swapping is when the swapper process features high in the process list when running top, also vmstat will show a lot of page ins and outs.

When you are running glance, what utilisation is your disk subsystem running at. Is it 100 percent or lower? If its 100 percent then its a disk IO bottle neck otherwise it could be either memory or some other IO, such as a network but thats unlikely. The other problem could be a hot disk - most IO being focussed one one disk. What disk array are you running? I'm assuming you are running some sort of SAN, if so the hot disk problem shouldn't exist, if its properly configured.

HTH

Andrew Y
Si hoc legere scis, nimis eruditionis habes
elimeli
New Member

Re: High Wait IO

Hi Again :)

Here are some more details.

we have 46GB RAM while the oarcle using around 30GB of it.

no Swappign at all.

Glance doesnt show disk bottleneck

sar -d doesn't not disk bottleneck

buffer cache hit ratio in 99% all the time.

Can Wait IO indicate something else rather than disk io ?
Andrew Young_2
Honored Contributor

Re: High Wait IO

Hi.

Just had a few other thoughts:

Run sar -v 1 2 to check if you are not running into any global process limits.

Run sar -q 1 5 to see the status of the system queues.

Other things that can affect it are kernel parameters like aio_proc_threads, nkthread, nproc and maxuprc set too low. It might also be a good idea to check if there are any ulimit restrictions on the oracle user.

HTH

Andrew Y
Si hoc legere scis, nimis eruditionis habes
elimeli
New Member

Re: High Wait IO

Here is the output for Sar -v. also the sar -q. But I don't know how to read it. there is a load when I took them.

sar -v :

19:44:35 text-sz ov proc-sz ov inod-sz ov file-sz ov
19:44:36 N/A N/A 772/16404 0 2303/724635 0 12177/248108 0
19:44:37 N/A N/A 774/16404 0 2303/724635 0 12182/248108 0
19:44:38 N/A N/A 772/16404 0 2303/724635 0 12177/248108 0
19:44:39 N/A N/A 773/16404 0 2307/724635 0 12177/248108 0
19:44:40 N/A N/A 772/16404 0 2303/724635 0 12177/248108 0
19:44:41 N/A N/A 773/16404 0 2304/724635 0 12189/248108 0
19:44:42 N/A N/A 772/16404 0 2303/724635 0 12176/248108 0
19:44:43 N/A N/A 774/16404 0 2303/724635 0 12181/248108 0
19:44:44 N/A N/A 773/16404 0 2306/724635 0 12184/248108 0
19:44:45 N/A N/A 772/16404 0 2303/724635 0 12179/248108 0


here is for the sar -q :

19:45:00 runq-sz %runocc swpq-sz %swpocc
19:45:01 1.0 8 0.0 0
19:45:02 1.0 8 0.0 0
19:45:03 1.0 34 0.0 0
19:45:04 0.0 0 0.0 0
19:45:05 1.0 26 0.0 0
19:45:06 2.0 8 0.0 0
19:45:07 1.0 25 0.0 0
19:45:08 1.8 33 0.0 0
19:45:09 1.0 17 0.0 0
19:45:10 2.5 33 0.0 0


thanks
Andrew Young_2
Honored Contributor

Re: High Wait IO

Hi.

Those all look fine.

When you run uptime during these times of high IO Wait what are the load averages looking like?

Are users complaining about slow responses?

Regards

Andrew Y
Si hoc legere scis, nimis eruditionis habes
likid0
Honored Contributor

Re: High Wait IO

i am having the same problems, i have a 11.11 box, with very high wait for ios, but, the rest looks ok, in glance, memory, disk and cpu ok..

but in sar i get:

19:47:11 %usr %sys %wio %idle
19:47:12 19 9 41 32
19:47:13 18 9 41 32


and with a sar -d it looks ok:

Average c3t10d0 6.19 0.50 8 53 4.47 10.63
Average c0t10d0 4.20 0.50 7 44 4.75 7.08
Average c33t2d0 30.87 0.50 372 6371 5.01 1.16
Average c28t2d1 22.08 0.50 182 3040 5.27 1.45
Average c28t1d1 40.96 0.50 532 9560 5.20 0.92
Average c28t1d6 16.88 0.50 185 3199 5.24 1.03
Average c28t1d4 36.96 0.50 495 7920 5.28 0.79
Average c28t1d3 32.37 0.50 193 3214 5.32 2.05
Average c25t0d7 12.69 0.50 328 5252 5.07 0.44
Average c25t2d3 30.37 0.59 81 3362 5.39 5.80
Average c25t0d6 13.19 0.50 191 4127 4.99 0.77
Average c25t1d4 15.08 0.53 26 611 5.43 7.69
Average c27t2d2 26.97 0.51 223 4676 5.02 1.78
Average c27t1d7 12.89 0.50 170 2715 5.06 0.78


sar -q, sar -v all look ok, and the process run queue is ok also:

load average: 0.28, 0.38, 0.37


any idea, what else it can be ?

thnx
Windows?, no thanks
Tim Nelson
Honored Contributor

Re: High Wait IO

My favorite subject ( review other threads ).

I am sure I will catch some arguments from this but, here we go...

I personally believe that the sar %WIO metric is archaic. Try to compare sar %WIO and glance disk metrics and get them to match.

%WIO can also include any type of IO not just disk. ( i.e. streams for example ).

These high level indicators just like disk utility are just that, high level indicators and you should then drill down to see if there is a read issue. An example of this is disk attached to SAN. 100% utility does not mean there is a disk issue. 100% is stating that during the interval the disk was doing something 100% of the time. Now you must drill down and look to see if any queuing and what is the service time ....

I think glance is a much better tool. Look at global wait states then individual wait states and keep moving from there.

If you have some busy disk, then drill further into what was mentioned above to identify a problem there..

You may possibly have a disk bottle neck but if the REAL stats do not show it then move on in identifying the real issue.