Operating System - HP-UX
1748011 Members
3633 Online
108757 Solutions
New Discussion юеВ

Oracle 9i on HP-UX 11i - Discrepancy between CPU and Elapsed time

 
SOLVED
Go to solution
Michele (Mike) Alberton
Regular Advisor

Oracle 9i on HP-UX 11i - Discrepancy between CPU and Elapsed time

Hi !

We have rp7400 (8 CPUs @750 MHz, 16 GB RAM) boxes and we were changing some kernel values to account for more processes running.

The parameters involved in the change were:

Beginning:
dbc_max_pct=50 (default)
nproc=2265
ncallout=2277 (forced)

After:
dbc_max_pct=25
nproc=8192
ncallout (dependent on nproc value)

We were running the same query on systems where the change was not applied and where the change took place.

Strangely we experienced a severe degradation in performances, apparently related to disk access bottlenecks (we have va7400 hooked up)

The systems where the change did not take place:

SELECT COUNT(*),SIM_STATE
FROM
S04_SIM_RTDB WHERE SCP_NAME =:"SYS_B_0" AND ACCOUNT_ID=:"SYS_B_1" GROUP BY
SIM_STATE


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.09 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 45.32 45.02 2513753 2524997 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 45.33 45.12 2513753 2524997 0 0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 32

we don't have any bottleneck and all the time spent seems to be CPU time.

On the other machines we have more than twice the same performances, presumably related to disk I/O.
SELECT COUNT(*),SIM_STATE
FROM
S04_SIM_RTDB WHERE SCP_NAME =:"SYS_B_0" AND ACCOUNT_ID=:"SYS_B_1" GROUP BY
SIM_STATE


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 59.98 152.41 2460449 2485124 0 14
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 59.98 152.41 2460449 2485124 0 14

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 76


My guess is that on one system the query is almost entirely cached while on the other the access to disk causes the slow down.

I changed back the parameters and strangely I did not improve the scenario, but after about 1 day the situation got normalized. Maybe it took a day to re-fill the cache.

Do you think this make sense ?

Thanks in advance !

Mike

9 REPLIES 9
Steven E. Protter
Exalted Contributor

Re: Oracle 9i on HP-UX 11i - Discrepancy between CPU and Elapsed time

Shalome Mike,

Makes perfect sense.

Your degradation is the system switching the size of the buffer cache from one value to another during operations.

The real killer here is the difference between
dbc_max_pct
and
dbc_min_pct

If the difference is large, the system can try and change this value frequently and these changes are EXTREMELY EXPENSIVE in terms of CPU. In other words, it degrades performance.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: Oracle 9i on HP-UX 11i - Discrepancy between CPU and Elapsed time

Total Doh in my part Mike,

The solution if not obvious is to set the
dbc_max_pct=
dbc_min_pct=

to a value equal or very close together.

With Oracle its better to set the dbc_max_pct and dbc_min_pct lower.

Oracle's SGA is more efficient than buffer cache at speeding up db performance than the OS buffer cache.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Michele (Mike) Alberton
Regular Advisor

Re: Oracle 9i on HP-UX 11i - Discrepancy between CPU and Elapsed time

Hi Steven,

thanks for your quick reply.

My concern is that the change we applied was to reduce the allocated percentage of buffer memory (50 > 25) so that we were expecting the improvement you mentioned.

Instead we experienced a degradation, apparently due to a cache misuse.

What do you think ?

Thanks !

Mike
TwoProc
Honored Contributor

Re: Oracle 9i on HP-UX 11i - Discrepancy between CPU and Elapsed time

Mike did you notice that your disk I/O (as far as Oracle was counting) was actually less, while your elapsed time was three times higher? Also, you did two fetches instead of one in the second example. Hmm, less I/O but more time, BUT two fetches instead of one. Seems to me that your execution plan changed between the two runs. But, it cleared up two days later... I'm thinking Oracle statistics here.

So, I think you were hit with two things:
1) You had less OS cache to cache your disks so while Oracle didn't notice more I/O (b/c the OS buffered it for you), it still took longer to get your data back.

2) You ran slower in the second test b/c something else kept your from getting all of your data in a single fetch... like a different execution plan... because your stats maybe weren't current.

First, it was a good thing to reduce your max_dbc_pct (should probably take it down further IMHO). But, the idea is to allocate the memory you took from the OS buffer cache and give it to the Oracle buffer cache. Oracle is better at handling the buffers for you anyways.

Second Problem.
The sql statements that you provided have no cost estimates or execution plan. Are you sure that the execution plans remained the same over that time period? It seems to me that if performance leveled out on the second day, then your stats weren't up to date for the first day, but possibly were recalculated that night, and for the second day, you were using more correct and more up to data statistics for your query.


I'll bet that your current trace has only one fetch and more closely resembles the first example of your posting. Which means that your stats weren't current.

Which means that changing your max_dbc_pct may not have been why you ran slower. It could be that your lack of stats gathering did you in. Also, if your stats gathering at night did run, but ran while the tables being computed were very busy, you could have ended up with half-baked stats, which are just as bad as (or possible worse than) old stats.

Anyway, keep an eye of the cost estimates for that query, and I think you'll learn more about what's going on.

We are the people our parents warned us about --Jimmy Buffett
Michele (Mike) Alberton
Regular Advisor

Re: Oracle 9i on HP-UX 11i - Discrepancy between CPU and Elapsed time

John,

thanks for the detailed analysis.

Regarding the stats I turned on the traces on the session I was using to execute the test query, so that I think it should be a good one, I re-ran them multiple times with same results.

I took a look at glance during the execution and I saw an abnormal Disk I/O which seems consistent with your comments.

The systems where the buffer was reduced to 25% have been running for days without improvement (measurements match the second example), but when moved back to 50% after about 1 day show the initial results (first example).

If I understand your comment about moving the buffered memory space to Oracle SGA, this could justify the degradation when moving to 25%, but do you know a possible reason (if this is confirmed, I'll check tomorrow) why when moving back to 50% the improvement takes so much time ?

Sorry for all these questions.

Thanks,

Mike
TwoProc
Honored Contributor
Solution

Re: Oracle 9i on HP-UX 11i - Discrepancy between CPU and Elapsed time

>>> Regarding the stats I turned on the traces on the session I was using to execute the test query, so that I think it should be a good one, I re-ran them multiple times with same results.

A) Yes, but you're not looking at cost estimates, you're look at the stats from the trace file, You need the "COST STATISTICS" to under stand the problem.


>>>The systems where the buffer was reduced to 25% have been running for days without improvement (measurements match the second example), but when moved back to 50% after about 1 day show the initial results (first example).

A) OK, then that confirms it. Move the amount of memory you removed from the OS buffer cache into the Oracle db_buffer_cache.


>>> If I understand your comment about moving the buffered memory space to Oracle SGA, this could justify the degradation when moving to 25%, but do you know a possible reason (if this is confirmed, I'll check tomorrow) why when moving back to 50% the improvement takes so much time ?

A) I think you had two problems, because of the two fetches I think your Oracle statistics on the tables that were involved in the query were not up to date.

AA) Or (or and/or) - your newly resized upward (back to 50%) OS cache wasn't populated and balanced right (with kept data on the highest hit disk areas) until the second day, the first day, the OS cache was basically having to load in data from disks (until it could be cache).

...

Likely story is a little/lot of all of the above, in varying degrees.
We are the people our parents warned us about --Jimmy Buffett
Steven E. Protter
Exalted Contributor

Re: Oracle 9i on HP-UX 11i - Discrepancy between CPU and Elapsed time

Mike summary point is:

Its not the value of dbc_max_pct its the difference beteween it and dbc_min_pct and the enormous performance cost of the system changing the size of the cache on the fly.

If you want my advice, make the SGA areas bigger and set dbc_min_pct to 5 and dbc_max_pct to 7

I think you'll see a substantial improvement.

There are many other things that may need tuning. Tuning is an art.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Michele (Mike) Alberton
Regular Advisor

Re: Oracle 9i on HP-UX 11i - Discrepancy between CPU and Elapsed time

Thanks a lot to both of you guys.

Really precious information.

We started analyzing those queries and we worked out an optimization adding an index which boosted performances.

I appreciated your comments and explanations.

Cheers,

Mike
Michele (Mike) Alberton
Regular Advisor

Re: Oracle 9i on HP-UX 11i - Discrepancy between CPU and Elapsed time

After investigating we realized that we could boost performances adding a new index.
Doing this we were less depending on the buffer cache setup.

Thanks to Steven and John for their quick support.

Mike