cancel
Showing results for 
Search instead for 
Did you mean: 

PRM and Oracle with higher IO

SOLVED
Go to solution
Niksa Franceschi
Occasional Advisor

PRM and Oracle with higher IO

At the moment we're testing PRM on our test server for databases.

Server has 4 CPU and 5GB RAM.
One database (we'll call it TESTDB) uses approximately 1 - 1.5CPU 24/7.
On same server there are also other Oracle instances.
TESTDB is of least significance, and should be throttled down in case there's need for more CPU resources by other databases on server.

For test setup, I've used config as follows:
OTHERS:1:90::
TESTDB:2:10::

oracle::::OTHERS,TESTDB

/u01/app/oracle/product/10.2.0.4/bin/oracle::::TESTDB,ora*TESTDB

This setup works correctly, and processes are assigned as appropriate.
However, during testing I noticed one behavior that makes things a bit more tricky.

On all our database server, due to high I/O load, there tend to be significant IOWAIT.
With 'sar -u', IOWAIT can range from some 5% even upto some 30% on some of our high I/O load database servers. However, 'top', or 'glance' see same IOWAIT as IDLE time, since basically IOWAIT is a block state for CPU (from what I understand).
Repercussion of that is that when load average on server exceeds 1, looking with 'top' or 'glance' it will show there is still IDLE on CPU, while looking with 'sar -u', idle will be 0.

This also has effect on PRM.
On test case as shown above, and on that server while average load is around 2, 'top' will show idle of around 10-15%.
prmmonitor shows CPU Used as:
OTHERS - 51.21%
TESTDB - 32.48%

Problem is, TESTDB is actually using more CPU resources than it should, since PRM does not see IOWAIT time as CPU used by TESTDB, and 'OTHERS' (default group) is already loosing come CPU cycles to it due to this issues.
As an example, if I set CPU cap on, prmmonitor will show:
OTHERS - 66.62%
TESTDB - 10.01%

which means 'OTHERS' group did receive some extra CPU time, that it should have gotten by PRM as is even with CPU cap off.

Issue is on our production systems - as I said before - we tend to have quite high IOWAIT times, and with PRM not counting it in as CPU time capping might not be accurate at all.

Is there any workaround for this issue?
2 REPLIES
Hein van den Heuvel
Honored Contributor
Solution

Re: PRM and Oracle with higher IO

>> However, 'top', or 'glance' see same IOWAIT as IDLE time

Which it is.


>> since basically IOWAIT is a block state for CPU (from what I understand).

No it is NOT. During IOWAIT the scheduler could not find anything better to do and the CPU is idle. The processes however are blocked, waiting for IO.

>> prmmonitor shows CPU Used as:
OTHERS - 51.21%
TESTDB - 32.48%

That's legit according to the first PRM doc I found: http://sysdoc.doors.ch/HP/B3834-90016.pdf
"Normally, processes running in a PRM group may consume more of the CPU resource than the specified group entitlement whenever any group is not fully consuming its share of resources"

So OTHERS did not claim its CPU (because it was waiting for IO) and TESTDB got it.

>> Problem is, TESTDB is actually using more CPU resources than it should, since PRM does not see IOWAIT time as CPU used by TESTDB

That interpretation is not correct, but it points to the core problem. That problem is that PRM controls CPU and MEMORY resourses. But it can NOT control the interaction of what the controlled object do with that CPU.
If TESTDB uses its constraint CPU resources to 'quickly' do a bunch of IO than those IOs will slow down any IOs to the same targets. This makes the IOWAIT time in OTHERS even longer which allows TESTDB to issue a few more IOs, which makes OTHERS slower still, which... :-).

>> As an example, if I set CPU cap on, prmmonitor will show:
OTHERS - 66.62%
TESTDB - 10.01%

So now TESTDB is no longer allowed to use that 25% idle time to issue more IO which would have caused OTHERS to slow down.

>> Issue is on our production systems - as I said before - we tend to have quite high IOWAIT times

In which case there is more idle time for the constraint resource to exploit allowing it to interfere with the main resource.

>> Is there any workaround for this issue?

You found it: # prmconfig -M CPUCAPON

The real solution is to make sure that those controlled resources do NOT share anything else except memory and CPU. They would have to be on distinct HBA's, Switches, Storage controllers, Diskgroups, and networks.
Of that list the distinct disk groups ( physical disks ) is the most important, but possibly the hardest to control.
The network is less important only if it has lots of room to spare. Of course 'it depends'. Without knowing anything about the resource usage, I would consider 100mb to be suspect, and gigabit connections likely to be ok.

Hope this helps,
Hein van den Heuvel
HvdH Performance Consulting
Niksa Franceschi
Occasional Advisor

Re: PRM and Oracle with higher IO

Hi,

yes, it does make sense, especially part with IDLE/WIO, as that part was a bit confusing (also where it pertained to average load on server).