System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

high IO WAIT cpu percentage

Doug Wilburn
Advisor

high IO WAIT cpu percentage

I have a GS1280/Tru64 5.1b running an oracle 9 db. My dba says oracle spends time waiting on IO and when I run vmstat -w..., I see IO WAIT %ages from 20 to as high as 60%. Yet, on my EVA, read miss and write latencies are realatively low - staying around 8ms. If EVA is not the problem, what would cause high IO WAIT %ages on the CPU? Any ideas would be greatly appreciated.
3 REPLIES
John Manger
Valued Contributor

Re: high IO WAIT cpu percentage

But is there actually a problem ? Is Oracle running 'slowly'. Has a user complained ? Starting from the user problem description is preferable to starting with 'a number' ;-)

Is Oracle 'tuned' appropriately for its workload ? If it is poorly tuned it could be flogging the disk subsystem unnecessarily.

Have you examined the load on a per-Rad basis ? Is the disk subsystem logging any errors (binerrlog) ?

JM
Nobody can serve both God and Money
Doug Wilburn
Advisor

Re: high IO WAIT cpu percentage

I try to be brief and I leave out big details. Here's what's happened. The db became corrupted because of a user error and we had to restore the entire db from a cold backup. We do batch processing at night, and ever since the restore, the batch processing takes significantly longer. Prior to the restore, the entire batch would 8-9 hours. Since the restore, batch takes 10-14 hours. I didn't change anything else. The dba swears the same thing. All the db files before and after restore are on one mount point - used to be on a single volume. When I restored, I renamed the original mount point and created a new moint point with the original name and restored there, again on a single volume/lun.
Hein van den Heuvel
Honored Contributor

Re: high IO WAIT cpu percentage

High wait time need not be a real issue. For example they could be DB-pages writes or Archive log writes happening without any end user waiting for them.

But you answered John's appropriate question indicating that there is indeed a real issue, not just a statistics observation.

For an analytic approach, compare Oracle statspack data before and after the problem.
You _do_ have (samples of) the before picture right?????

How about SAR or COLLECT data before versus after?

With just a current picture of the statspack output taken from snapshots around the batch window you want to look at the wait events for clue.

If that is all new to you, then try attaching a statspack TXT output here (if it does not give away corporate private data), or seek support from an Oracle consultant and/or tru64 person.
I suspect that you should use an Oracle specialist to highlight the problem, and a system specialist to solve the problem.

>> et, on my EVA, read miss and write latencies are realatively low - staying around 8ms.

8 ms is relatively high for writes. Confirm/verify in statspack details.
It should be in the 0.5 - 2ms range as the EVA cache should suck it all in.

So that MIGHT be a clue to the underlying problem: an inefficiency in the EVA Cache.
Do a health check on the EVA.

Was a new LUN carved for the restore? Could the avaiable EVA cache be split between more luns now? If the old lun is still there then I would try moving some relatively easy to move DB activity to the other (old) lun.
For example REDO and TEMP can readily be moved without anyone noticing.
And alternative target (to move) would be ARCH and REDO.
Unlike DATA/INDEX, For all of those object about you don't have to copy old data around, just use fresh definitions.

Hope this helps some
Hein van den Heuvel ( at gmail )
HvdH Performance Consulting