1748022 Members
4923 Online
108757 Solutions
New Discussion юеВ

Re: disk bottleneck?

 
doug mielke
Respected Contributor

disk bottleneck?

I have HP9000 N class running Oracle Financials with a couple hundred users.
Data files are on Compaq/HP SAN, Application files are on EMC/ Clarian scsi over fiber array, OS is on 2 internal non-mirrored drives.
System has 8 procs, 24 gig ram, set from 2 to 10% dynamic cache.

My end of fiscal year performance is suffering.

I have some questions:
How can I determine how much cache I'm using, and when does dynamic caching adjust up and down?
Is it possible that my new san HBAs respond faster than the internal scsi ( core I/O ?) thus dominate the system buss, thus making my sar -d numbers misleading?
Is there a command to manually flush cache buffers as a test (vhand?)

At times, ( events last 10 to 20 minutes) my queue length, and wait times, (in Sar -d) grow on my internal drives when I'm accessing the San heavily. ( San does as well, but not nearly as dramatically), and users report performance degradation. Exercising internal drives heavily seems to have no effect. Oracle application drives not hit hard at all. (using Oracle buffering?)
cache hit rates are lower than usual, but not bad (75% read, 60% write. Access times on the San are lightning fast ( async writes to San contollor, I guess). Overall I/O is high, but not to the magnatude that the performance is suffering. Sar -M never goes to zero idle, and waiting for I/O is even low ( 10-15%)
There are no disk errors in syslog or dmesg, ( no retrys likely then?)
11 REPLIES 11
Ian Dennison_1
Honored Contributor

Re: disk bottleneck?

First question - hows your memory? Little memory means lots of swapping, specifically to local disk. vmstat should point to lots of paging in / out, and a high system CPU should indicate IO problems.

Second question - What else is on local disk? Redo Logs? Log Files?

Third question - Got Glance? (even the eval version from the Apps CDs)

Share and Enjoy! Ian
Building a dumber user
A. Clay Stephenson
Acclaimed Contributor

Re: disk bottleneck?

The sync command can be used to clear the flush the buffer cache. You really need Glance to display current buffer cache settings OR simply hardset bufpages to a non-zero value and then you KNOW how much it is. Judging from your input, I would say that with 24GB your buffer cache is staying at the 10% value (2.4GB); in general, something around 400-600MB for 11.0 is about optimal and in most cases performance actually degrades above this.
For 11.11, 800-1200MB is about optimal. In many cases, it takes longer to search the buffer cache than to retrieve from disk especially when the disks are cached themselves. If nothing else, you can install the Trial version of Glance; you will see in seconds what is wrong.

Because the disks don't seem to be hit too hard, I am guessing that you are doing ton's of logical i/o's. I would reduce buffer cache and see if you see any improvement.

I really suspect that the place you need to look is the SQL code itself, A few sqlexplains might be in order.
If it ain't broke, I can fix that.
Steven E. Protter
Exalted Contributor

Re: disk bottleneck?

In our shoes, I'd use sar to collect some background performance data.

Attaching a script, which is adjustable and will let you know which disk or vdisk is overloaded. You can take appropriate action after than.

On the san, we specify that oracle data is raid 10, fastest, best.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Caesar_3
Esteemed Contributor

Re: disk bottleneck?

Hello!

Install glance plus from applicayion CD,
it's free for 60 days.
If not you also have the sar command
that shows the I/O activity.

Caesar
Sridhar Bhaskarla
Honored Contributor

Re: disk bottleneck?

Hi,

"sync" process is the one that causes the changes to be written to the disks.

If you have latest /usr/contrib/Q4/bin/kmeminfo, it should give you the buffer usage in pages. On typically active systems, you should see the buffer cache at max_dbc_pct value. So, I will not be surprized if you are using all of your 1.24GB of buffer cache. I do not say you are wasting memory by alloting this much buffer cache as you have 24GB which is plenty. Look at your "vmstat 5 2" output and multiply the value in 'free' with 4096 to get the free memory. However, with this big buffer cache, there is a chance that the IO writes may *explode* periodically causing your system to look like frozen intermittently. You can reduce the buffer cache around 500MB (by keeping 3% dbc_max_pct). But in your case, the best bet is to have 500MB static buffer cache (by setting nbuf kernel parameter) so that there will be less work for the kernel to adjust the dynamic buffer cache. Particularly if your application is dynamic, this may be a significant overhead.

You will need to note that having low buffer cache will not avoid *double buffering* but will avoid IO bursts.

Get 'glance' on the system as soon as you can. It gives you a very good picture of what is happening on the box. It has got a good online help too. If you already have it, type "m" to get to the memory screen where it will show the details of memory usage.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
doug mielke
Respected Contributor

Re: disk bottleneck?




It was probably a bad idea, but I was thinking that I'd remove everything from cache, then try to watch to see what drives were hit hard. Sync would force the dirty buffers to write, but not really change/ delete the contents of the buffers.

I have Glance on now, and I am up to 2.4 gig cache. I also have gigs of free mem.
If having too much cache is a performance hit, where should I allocate my free memory? Can I add more queues, lists or other structures to allow large cache? My memory is all dressed up with no place to go.

And Clay,
You are so correct in pointing at SQL, our legacy code is an SQL nightmare, and we are slowly cleaning it up.
A. Clay Stephenson
Acclaimed Contributor

Re: disk bottleneck?

With that much free memory, I would hard set the buffer cache using bufpages. Bufpages = number of 4k pages so that bufpages=204800 would set it at 800MB. Again, 400-600MB for 11.0 and 800-1200MB for 11.11; I would tend towards the low side. Leave nbuf=0; that's almost always optimal. You can leave the dynamic values where they are since they will be overridden by the nonzero bufpages value.

Next, increase your SGA. If this is 64-bit Oracle (and it should be) then blow the SGA out to 5GB or so - bigger if you like and have the free resources. After equilibrium, Oracle will barely know it's talking to disk devices.

Because this is Oracle and because some "experts" may have assisted you, check the value of timeslice. If it ain't 10 then make it so.

Finally, my rule of thumb is that if a 2x performance increase will do then I might be able to tune and tweak and get that (rarely) but if I need much more than that to become "acceptable" then it's time to code (or at least add indices). There have been many times where adding a single index did far more than all the hardware replacements, tuning, and io distribution combined.


If it ain't broke, I can fix that.
Tim D Fulford
Honored Contributor

Re: disk bottleneck?

Hi

Not answering the question here just saying what I did in a similar situation. I'm a fan of MeasureWare, glance is good, but you have to be 'round when things go bad. MeasureWare gives the history.

I'd plot out individual disk utilisations and disk queues, you may answer some of your questions. If you put on version MeasureWare C.03.70 (or above) you get both throughput & bandwidth results.

just my 0.02??? worth

Tim
-
Tim D Fulford
Honored Contributor

Re: disk bottleneck?

here is a link to a previous disk bottle neck. My answers were a bit fuller on the subject!!

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x612063f96280d711abdc0090277a778c,00.html

Tim
-