1832965 Members
2659 Online
110048 Solutions
New Discussion

CPU awaiting IO metric

 
Andy Zybert
Advisor

CPU awaiting IO metric

I understand CPU waiting for IO, as reported by sar, indicates a possible IO bottleneck and therefore has a negative effect on the performance of any system, but at what level should I worry.

Previous forum responses seemed to suggest 5-10% average is ok, but any thing above this would constitute a constraint on the system. Is this a recognised industry standard?

A response from an HP performance expert would be appreciated.
DTS beats Dolby Digital 5:1
9 REPLIES 9
Edward Sedgemore
Trusted Contributor

Re: CPU awaiting IO metric


Ive got many many years experience on HP servers and in my professional opinion the following sar wait on IO percentages apply (note, this is an average over a considerable time period during working hours);

> 20% Completely I/O bound - something needs to be done to fix it (more/faster disks, controllers, striping etc.)
10-20% Very busy, the system is still I/O bound whereby I would expect to do something to improve it.
5-10% The system is busy but acceptable.
<5% Not IO bound, IO performance is fast and responsive.

Someone from HP gave us these. Our new L and N class servers with their exceptionally fast disks and Ultra scsi controllers all run inthe <5% range, even with big Oracle databases being heavily used on them. Our older K' s and D's typically run 10-20 with their slower disks and controllers.


Andy Zybert
Advisor

Re: CPU awaiting IO metric

Edward,

What is you opinion of a N4000/44 6cpu 6Gb system averaging 40% WIO but no user performance problems reported.

The application is essentially an oracle datawarehouse/onlinedatastore queried using business objects.

Does an acceptable level of WIO depend to some degree on the application.
DTS beats Dolby Digital 5:1
Carlos Fernandez Riera
Honored Contributor

Re: CPU awaiting IO metric

That`s then best of these machines a 40% of Wio whitout user problems; figure out when it were run at 10%.

First step is discard swaping.
Use swapinfo.

Then you must analyze oracle performance...




unsupported
Edward Sedgemore
Trusted Contributor

Re: CPU awaiting IO metric


To some extent an acceptable level of WIO does depend on the application but to an even more extent is how acceptable it is to the users. If your users dont complain becuase its always been like, or its because your application is not one demanding a realtime fast repsone time then thats fine.

From a performance view a WIO of 40% is very bad. The point being you could make that 40% reduce to 5% or so which would utilise your resources, by improving your IO throughput. I'll bet your Oracle db isnt utilising striped logical volumes spanned over multiple controllers and as many disks as possible ? What is your disk subsystem ? XP/Jamaica/FC/HC/Autoraid ?


Thierry Poels_1
Honored Contributor

Re: CPU awaiting IO metric

Hi,

Running a N4000/44 with 6 cpu's and 6GB averaging 40% WIO ...

... is like continiously driving a Ferari in second gear. You will probably be a lot faster than any standard car, but imagine if you could switch to 3rd or 4th or 5th gear :)

Tune that I/O !

Thierry.
All unix flavours are exactly the same . . . . . . . . . . for end users anyway.
Tim Malnati
Honored Contributor

Re: CPU awaiting IO metric

It certainly appears that the application is i/o bound but this may not be the machine's fault. A data warehouse application by its very nature is typically i/o bound if the machine is properly tuned otherwise.

The first defense in this situation is to run Glance plus and study the process in question. Drill down and take particular note of what is happening with the open files output. How many files and tables are being accessed? How many of these are large files and how many are experiencing a random access condition?

To get to the bottom of this you really need to think in terms of what the application process is doing from an i/o standpoint. When you evaluate it you need to keep in mind that the 'enemy' is every random access to disk. Multiple files being accessed randomly in the same volume set is even worse. The real key to improving this problem is the insight, communication, and cooperation of the three players in the mix: the SA, the DBA, and the PA (and possibly some others like production control, ops, etc). Add a little cache here, move a 'problem' file there, add an index, use a temporary work file, sequence random access jobs, etc. are all possible improvement methods (there are more).

The bottom line of what I'm suggesting here is the wio is a symptom and more often than not the correction does not mean rebuilding or retuning your machine.

Andy Zybert
Advisor

Re: CPU awaiting IO metric

Edward,

The disk subsystem is XP256.
DTS beats Dolby Digital 5:1
Roman Dijanosic
Advisor

Re: CPU awaiting IO metric

Just my litttle, little opinion (or guessing). Edward Sedgemore has certainly right about his figures.

To your 40% IO Wait I would just say that your system
just isn't scalable. I would say if the users say now that
performance is O.K. try to ask those same users when
the number of concurrent users increase for about 20%
percent. I would guess that it would nearly come to
stand-still.

I have just recently have that kind of experience and the
situation before this 20% increase of load was that I was
in the "region" 10-20% IO Wait. I thougt: O.K IO is
maybe problem but not so much. And I was badly
wrong...

Roman D.
Les Schuettpelz
Frequent Advisor

Re: CPU awaiting IO metric

We need quite a bit more information to evaluate this problem. We know the server CPU/memory configuration and the disk array model, but we don't know any of the application sizing and storage connectivity details:

1. Total size of the database
2. Total number of Physical Devices presented to the host
3. Type and number of Host Bus Adapters used
4. Type of connectivity: direct or SAN
5. If SAN, is it QuickLoop or Fabric

If SAN/Fabric on TachLite, how many TachLites (ioscan -funC fc) are in use, and how many buses (ioscan -funC ext_bus | grep fcp) are you emulating? Are these cards really balanced on the N backplane? Some factory builds we have seen didn't follow the best slot-selection rules.

Overall, does it seem to be spreading the I/O's as far as possible on the existing configuration, or is it hitting 1 or 2 Physical Volumes much of the time?

Also, inside the XP256, what size are the real physical disks, and how much attention was given to how the server devices are mapped on the true physical devices?