cancel
Showing results for 
Search instead for 
Did you mean: 

sar shows zero idle

doug mielke
Respected Contributor

sar shows zero idle

This post is related to post:

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=226549

My Oracle financials, running on an N class takes a performance hit sometimes in late morning early afternoon. It is when concurrent manager gets over 20 requests, (currently we have 35 in the queue)

Attached is are sars during the 'event'
-b -M -d.

The disk i/o and cache hit rates remain relatively constant throughout the day.
The big clue is the sar output showing waiting for i/o being less than 1%.

Oracle is pulling somethings into memory, and processing them ( ineffiently? thrashing? deadlocks? )
This looks like an index problem, but the DBA's explain plan shows indexes being used (i'm told) and when the concurrent manager queue drops below 20, the sar 'waiting for i/o' will raise, like any behaved database should.

This is what starts the finger pointing to my EVA dirves as the culprit as described in the link above.

What tools can I use to find out whats going on here. I have Oracle Enterprise Manager, but am below novice level in it's use.




9 REPLIES
Jean-Luc Oudart
Honored Contributor

Re: sar shows zero idle

Could you run statspack on the oracle database for 30- 60 mns and then run the report.

This should give some information re resource utilisation.

Rgds,
Jean-Luc
fiat lux
doug mielke
Respected Contributor

Re: sar shows zero idle

I've had a suggestion to raise shmmax, but I'm not sure it's based on anything more than a guess that we need more shared memory for all of the 'non-SGA' related processes.

system has 24 gig ram.

32 bit oracle, so our SGA is around 1.7 gig
shmmax is currently 4.8 gig.

Is there a way to confirm this guess? Can I see shared memory utilization / shortages?
doug mielke
Respected Contributor

Re: sar shows zero idle

...and this attachment is from syslog. It doesn't look like an error, but why am I getting it now?
A. Clay Stephenson
Acclaimed Contributor

Re: sar shows zero idle

Now, it's time to use Glance to drill down into the busiest processes and see what system calls are being hammered. That will point you to where any possible problems lie in the SQL code.

It would also be good to know if you are seeing a high context switch rate (Glance or sar -w).

Also attach a kmtune output.
If it ain't broke, I can fix that.
Jean-Luc Oudart
Honored Contributor

Re: sar shows zero idle

Seems to be a bit of waste of RAM if you're using 32 bit version as you are limited to the 1.75Gb RAM for the SGA

http://forums1.itrc.hp.com/service/forums/parseCurl.do?CURL=%2Fcm%2FQuestionAnswer%2F1%2C%2C0x8f2e8f960573d611abdb0090277a778c%2C00.html&admit=716493758+1064935262507+28353475

What is the database size ?
Any chance to run statspack and post the results ?

Rgds,
Jean-Luc
fiat lux
doug mielke
Respected Contributor

Re: sar shows zero idle

I would expect context switches to go up during this event, but how much is too much?(thrashing??)
here as sar -w from yesterday, followed by today, then a kmtune output.

I'm off to Glance land...
Steven E. Protter
Exalted Contributor

Re: sar shows zero idle

I'm not sure shmmax will help much. Your system will ignore any setting greater than 25% of system memory.

Memory is defined as RAM plus swap.

Here is an Oracle tuning doc, for what its worth.

http://www2.itrc.hp.com/service/cki/search.do?category=c0&docType=Security&docType=Patch&docType=EngineerNotes&docType=BugReports&docType=Hardware&docType=ReferenceMaterials&docType=ThirdParty&searchString=UPERFKBAN00000726&search.y=8&search.x=28&mode=id&admit=-1335382922+1064937766794+28353475&searchCrit=allwords

It was written by a guy who helped us with some Oracle performance problems.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
A. Clay Stephenson
Acclaimed Contributor

Re: sar shows zero idle

Well, now I think you have finally hit on your problem : 32-bit oracle. There is really no point in increasing shmmax because the absolute limit is 1.75GB if SHMEM_MAGIC is enabled. If these are multiple instances, then you can use memory windows to give each process (or group of related processes) its own 4GB address space. The real answer to your problem is to go to 64-bit applications and that will allow huge (and happy SGA's).

The 64-bit code doesn't execute faster (it's actually slower than 32-bit code) BUT because resource limits -- like maximum addressable space -- are gone the overall performance can increase dramitically in application which require large amounts of resources -- e.g. Oracle.

If you are stuck in 32-bit land then I can suggest one thing that may help. Install all the latest LVM / JFS / SCSI performance patches for 11.0. Next increase your buffer cache. I would suggest that you do it with a non-zero bufpages rather than dynamic buffer cache because we want to eliminate as much kernel overhead as possible. The idea is that we are going to partially address the limited SGA buffer size by using UNIX buffer cache as well. 11.11 systems actually perform better using cooked files and well-patched 11.00 systems may perform better under Oracle with cooked files as well. Normally, I limit 11.00 buffer cache to about 800MB (yours is currently 1200; 5% of 24GB) but in your case I would try 1600MB (or so -- bufpages=409600) and look for any improvement. Surprisingly, you might actually find the reverse to be true and that the best performance is found somewhere around 600MB --- if the system is spending a lot of time searching the buffer cache it may be faster to get the data directly from the disk array -- which itself is cached.


Now having said all this, I think the real problem is in the SQL/application code itself especially because the CPU's are spending so much time in user (rather than system) mode.
If it ain't broke, I can fix that.
doug mielke
Respected Contributor

Re: sar shows zero idle

It took a while, but here is output from our limited implementation of Statspack.

Any hints in here as to the combination of events that triggers our slowdown?