Re: Improving asynchronous IO performance

Stephen Andreassend · ‎03-11-2003

Hi,

We have Oracle redo logs on raw partitions with async IO.

I want to know if I can improve async IO performance by changing the /dev/async device minor number.

Here are the options:
0x000000 default
0x000001 enable immediate reporting
0x000002 flush the CPU cache after reads
0x000004 allow disks to timeout
0x000005 is a combination of 1 and 4
0x000007 is a combination of 1, 2 and 4

Oracle Support always makes reference to 0x000000. However, 0x000001 sounds a lot faster - so my questions are, is 0x000001 the fastest way to get a commit, and is it safe to use?

Thx
Steve

John Palmer · ‎03-11-2003

I'd go with what Oracle say. It's their code and if they've written it for the default setting then changing it would be unwise in my opinion.

Just saying that 0x000001 'sounds faster' doesn't mean that it is!

Have you actually identified that your redo logs are a bottleneck?

Regards,
John

Stephen Andreassend · ‎03-11-2003

Thx for the answer!!

Yes redo logs are a bottleneck as we do many real-time commits per sec and we desire almost a sub-millisecond response time. Oracle Support have said they dont really have a clue, that this is an HPUX issue, so I am keen to see what people here have to say.

Bill Hassell · ‎03-11-2003

Immediate reporting is a feature that tells the disk to immediately report good status once write data has been received by the internal disk buffers and not waiting until the disk seeks and actually writes. On the surface that sounds faster and indeed would be as long as the access is in bursts. owever, a continuous stream of data will eventually fill up the disk's buffers because the disk simply can't write fast enough. The downside to immediate reporting is that unless the disk provides a battery backup for the cache, a powerfail on the disk will lose whatever data is still in the buffers. Modern disk arrays have batteries, JBODs and internal disks do not.

Sub-millisecond I/O rates are not possible with mechanical disks which means that once the buffers on the disk are full, all additional I/O is queued and will wait for some disk writes to complete. True sub-millisecond I/O that is not affected by seek times are only possible with a solid state disk. There are a number of vendors that provide these products, albeit just a bit pricey.

Bill Hassell, sysadmin

Steven E. Protter · ‎03-11-2003

Oracle support might not support you if you don't follow their recommendations. You should see if they have any guidelines in this regard.

I run oracle and vote for what Oracle actually tested on which is probably default.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Volker Borowski · ‎03-11-2003

Hi,

I would vote against this flag.
It might tell Oracle that a commit has in fact been executed, while the corresponding log-data is still somewhere between the server-bus and the disk.

I read your metalink thread dated may last year.
You have a high-end-storage system in between, which should really be able to store the data very fast into a well protected cache area first.
I doubt if you will really get a benefit from this.

Check if you can set a few parts of your database to NOLOGGING (i.e. secondary indexes), but be aware, that some special recovery steps might be needed in case of real recovery (i.e. recreate the indexes)

Good luck
Volker

Stephen Andreassend · ‎03-12-2003

Thanks for your responses.

Yes, we do have mid-range VA7400 disk arrays with redundant write-back write caches in RAID 0+1 "Normal Performance" mode (ie AutoRAID mode is disabled), so we do get reasonable throughput.

Our real-time billing application does many many updates per second, but commits in batches for performance reasons. We do not use multiple connections to Oracle or multi-threading so we are blocked while wating for a commit to complete, even with an asynchronous log writer.

We have simplexed our redo logs at an Oracle level and have measured the result by monitoring the 'log file sync' value for just our application session in V$SYSTEM_EVENT.AVERAGE_WAIT. This column is in hundredths of a second, and our old value was 0.5 and now it is 0.1. One might think that asynchronous IO would avoid an overhead from having duplexed log files, but apparently not.

So this means we have reduced our commit time from 50ms to 10ms. However, there's always room for more performance.

If "0x000001 enable immediate reporting" is not the default, then this would imply that the 0x000000 has disabled immediate reporting and therefore is not truely asynchronous?

Steve

Yogeeraj_1 · ‎03-12-2003

hi steve,

Consider tuning at the Oracle level.

Have a look at the following document:
http://technet.oracle.com/deploy/availability/pdf/oow2000_sane.pdf

I would recommend running Statspack (we would have the history to compare to) and to occasionally tkprof an application or two. Otherwise, we have nothing to compare to (other then our "memories") and can only sit around and speak hypothetically.

Also, having data files and other things on the same device as your online redo log can definitely impact log writers ability to do its job as fast as possible.

Things you can do:

o look at your transactions. If you are "over committing" -- commiting in the mistaken belief that you are saving resources and not on true transactional boundaries -- stop it. Commit only when your transaction is over and do not commit too frequently (if you have very small transactions, so be it -- you MUST commit but if you are committing just for the sake of committing, don't)

o speed up LGWR. Make it more efficient. Ensure LGWR and ARCH are not contending with eachother (you want at lead 5 devices dedicated to logging -- you NEED to mirror redo members and you NEED to archive. So, if you have disks 1..5 you can:

have redo log groups 1, 3, 5, 7, .... on disks 1 and 3
redo log groups 2, 4, 6, 8, .... on disks 2 and 4
archive destination to disk 5

Now, when LGWR is writing to groups 1, 3, 5, 7, .... on disks 1/3, ARCH will be reading groups 2, 4, 6, 8, ... on disks 2/4 and writing to disk 5. When LGWR advances, so does ARCH and they'll switch disks.

HTH
Yogeeraj

No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)

Stephen Andreassend · ‎03-14-2003

Thanks.

I dont think I need to use Statspack when I am already directly looking at the V$SYSTEM_EVENT view and I already know that I want to reduce my 'log file sync' wait event.

For reference we have our db sitting on 45 disks, and the VA arrays do not allow us to control which files go on which disk. Also, all our redo reads/writes are to/from the controller cache memory, so I am not waiting on moving parts when I perform a commit. As I noted before, having more than 1 log member per group significantly increases the amount of time I must wait for a commit to complete.

We also have the Unix kernel parameter default_disk_ir=1 to enable immediate reporting as writes are placed in the array cache, though I dont think this is actually necessary as our write-cache on the VA is configured as write-back.

There were 2 parts to this post, whether 0x000001 is safe and whether it will offer more performance. Obviously there is a potential for danger if there is an array crash and writes in-transit between the async device and the array cache get lost but Oracle thinks they are safe. But we've already accepted this risk by using async IO in the first place.

To resolve the performance question, I think I will perform some benchmarks to compare 0x000000 and 0x000001 and see if there is any marginal gain.

Steve

Stephen Andreassend · ‎04-01-2003

I performed some load tests and got the following results (hours:mins:secs.ms):

0x000000 default:
Test 1 = 00:01:01.45
Test 2 = 00:01:00.70
Test 3 = 00:01:01.56

0x000001 enable immediate reporting:
Test 1 = 00:01:01.49
Test 2 = 00:01:01.64
Test 3 = 00:01:01.20

0x000002 flush the CPU cache after reads:
Test 1 = 00:01:01.24
Test 2 = 00:01:00.85
Test 3 = 00:01:00.82

0x000004 allow disks to timeout:
Test 1 = 00:01:02.46
Test 2 = 00:01:02.33
Test 3 = 00:01:02.73

0x000005 is a combination of 1 and 4:
Test 1 = 00:01:08.44
Test 2 = 00:01:08.84
Test 3 = 00:01:08.62

0x000007 is a combination of 1, 2 and 4:
Test 1 = 00:01:04.08
Test 2 = 00:01:04.36
Test 3 = 00:01:04.50

What we can see is that the default option, which is normally specified by
Oracle, is sufficient. The option to "allow disks to time out" has a
small performance penalty. These test results suggest that the "default
option" is actually "immediate reporting" - by definition of async IO,
this is to be expected.

Based upon these test results, I will stick with the default option.

Thx all,
Steve

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Improving asynchronous IO performance

Improving asynchronous IO performance