Databases
cancel
Showing results for 
Search instead for 
Did you mean: 

I need the hardcore facts about aio_write

Alan Riggs
Honored Contributor

I need the hardcore facts about aio_write

I have long followed the maxim that enabling asynchronous writes for databases increased performance at teh cost of increased risk of data corruption. A search through the archives of this (and other) sites shows that to be a common understanding. However, Oracle recommends use of async IO and makes absolutely no reference to increased risk. I am looking at cooked filesystems, and here is a general outline of the 2 sides of the argument:

Sysadmin: asynchronous writes of cooked space do not update the intent logs on vxfs filesystems in a predictable manner. Use of the O_SYNC or D_SYNC flags on file open requests will ensure that the aio driver in the kernel reliably reports ot the database whn writes have been committed to disk, buit it does not ensure that the filesystem will be in an automatically recoverable state after system crash (because metadata is not written to disk in the order that fsck expects). Because of this, manual intervention for filesystem cleaning is almost always required after a crash. Full recovery of filesystem state requires that the sysadmin be able to properly repair all inconsistencies. Any errors in this process represent teh loss of data and create the risk that the database files will not be in a ocnsistent state--thus requiring a rollback.

DBA: Oracle always opens files with D_SYNC, and thus the aio subsytem will always reliably report to oracle when a write has been committed to disk. Because of this, there is no risk of the database not being in a consistent state after a crash, whether the database is using raw files or cooked is immaterial.

I need the hardcore skinny on this, folks. Any help will be greatly appreciated.
7 REPLIES
Alan Riggs
Honored Contributor

Re: I need the hardcore facts about aio_write

An update: it appears the DBA perspective was coming from an OS (aix) which implements threaded asynchroinous IO for cooked filesystems. HP-UX only supports kernel asynchronous IO (IO using the aio_write, etc. function calls) on raw volumes.

My question now get translated slightly, since I have always heard the same concerns expressed about asynchronous IO on raw volumes. So, what's the straight dope? Is there truly a concern about data loss when running Oracle asynchronously against raw logical volumes?
Ruediger Noack
Valued Contributor

Re: I need the hardcore facts about aio_write

Hi Alan,

I'm not a hardcorer in this facts but I have got some experiance with HP-UX and oracle. It's also a little bit difficult for me to express what I think (not native english speaker).

Oracle can guarantee with D_SYNC logical consistency on file level, means within and between their files (datafiles, redo logs, ...). With D_SYNC oracle get the commit of the first write to disk and oracle will wait for it before it performs the following write in this logical chain. It doesn't depend from aio on disk level. Their rollback mechanism is now able to recover the database to a consistent state.

After a crash the recovery point depends on the availability of your redolog and archive log files. The sysadmin has to repair the filesystems and to recover the oracle files (from mirror or backup) if necessary and than oracle will go his (recovery) way.
If you use the mirroring strategy of oracle to mirror the redo logs to different disks and also do permanent backups of archive log files you will be able to restore this files and I see no danger for the database if using filesystems.
On the other hand I think it is much more simpler for backups of data files, archive logs, ... if the files reside on filesystems.

Ruediger
Paula J Frazer-Campbell
Honored Contributor

Re: I need the hardcore facts about aio_write

Hi Alan

I do not run "Orible", but Universe and never use fs_async on.

I have had major server crashes on the live databases and never had a problem.

With Async on and a crash do you really want to have to fix the database before you can go live again?

Messing about with redologs etc is not what you want to do when trying to bring the prime server back online.

Just my opinion.

Paula



If you can spell SysAdmin then you is one - anon
Mark van Hassel
Respected Contributor

Re: I need the hardcore facts about aio_write

Hi Alan,

Not a suggestion about the control of the metadata but about the user data:
When you have Adnavces JFS installed, have a look at the mount options mincache (controls asynchronous writes) and convosync (control synchronous writes, files opened with O_SYNC flag).
-o minchache=direct or -o convosyn=direct causes all user data written directly to disk, bypassing the system's buffer cache.
This way you have the advantage of file systems over raw logical volumes and the advantwage of raw logical volumes that the database uses only its own caching mechanism.

HtH,

Mark
The surest sign that life exists elsewhere in the universe is that none of it has tried to contact us
Alan Riggs
Honored Contributor

Re: I need the hardcore facts about aio_write

I think I need to clarify my question. I am not asking (having learned that HP-UX does not implement threaded asynchronous IO) about the behavioer of fs_async and cooked filesystems. I see no reason to turn fs_async on for file-based Oracle implementation since the database does not gain access ot the aio subsystem for cooked files. I would not expect to see meaningful performance gains and would expect to see more difficult manual interventions required ot recover after a system crash. I do use the advanced JFS mounting options to bypass system buffer cache for Oracle filesystems.

My question is specifically about the reliability of using the aio_write() call for database writes with respect to recovery after system crash. I understand that the aio subsystem can guarantee that the database has a consistent view of data during normal operation. My question is: how reliably (and quickly) can a database running on raw volumes recover from a system crash?
Ruediger Noack
Valued Contributor

Re: I need the hardcore facts about aio_write

Sorry Alan,

I took the wrong way...

Do you know this metalink document?
DocID: Note:139272.1
Subject: HP-UX: Asynchronous i/o

quota
********************

In summary, fs_async is ignored for datafiles(due to open() with O_DSYNC).
However, filesystem metadata may be lost, potentially causing datafile
corruption.

Oracle does not recommend setting fs_async to '1'.

Settings:
fs_async=0 Do not use async writes to file systems
fs_async=1 Do async writes to file systems

*********************

What do I think about raw devices? Oracle keeps his data consistent due to D_SYNC and there are no other data (especially metadata) in it...

Ruediger
Carlos Fernandez Riera
Honored Contributor

Re: I need the hardcore facts about aio_write

If I recall well, Oracle have to write redos and rollbacks before to write datafiles, and the consistency comes by this way.. a transaction will not modify any datafile before it has benn tracked on rollbacks and redos.


When you use raw datafiles you must configure fs w/o async to store archived redologs on disk syncronously.


It is dufficult to me explain it fairly.
unsupported