Disk Enclosures
1751907 Members
4705 Online
108783 Solutions
New Discussion юеВ

Raid Write Cache corrupts database

 
Kay Behrmann_1
Occasional Contributor

Raid Write Cache corrupts database

Dear Forum,
we run a Proliant DL380R03 with Smart Arrai 5i plus and battery backed write cache. I had set this to 50% read and 50% write acceleration, and first things looked fine. Now I got a database corruption on my MS SQL2000 Server in the msdb database. The server reports an error "823", which is described in MS knowledge base article 828339 and usually indicates "hardware problems", as MS says. Our MS Exchange Server on this box also shows strange effects, which could be explained by a corrupt write cache, too.
1) Has anyone experienced similar problems?
2) how can I test if my write-cache is causing this?

Please help,
Kay
6 REPLIES 6
Greg Carlson
Honored Contributor

Re: Raid Write Cache corrupts database

Kay,

Databases normally should be run without cache writing, otherwise you can end up with database corruption. Even with battery backup, on a database you should be set for direct i/o, not cached.

Change your write policy to direct i/o and then monitor your system.

Ciao,
Greg
Lets Roll!
Greg Carlson
Honored Contributor

Re: Raid Write Cache corrupts database

Kay,

Here is a description of what is going on with caching and the database:

http://www.findarticles.com/cf_dls/m0BRZ/3_20/61620919/p1/article.jhtml

Write caching is based on another simple principle--it takes a few microseconds to store write data in a controller's cache versus a half dozen milliseconds to store it on disk. Writing to (or reading from) cache is over 1,000 times faster than writing to (or reading from) disk. There are two types of write caching: write-back and write-through. With write-back caching, a write is written to cache, the I/O is acknowledged as "complete" to the server that issued the write, and some time later, the cached write is written or flushed to disk. When the application receives the I/O in complete acknowledgement, it assumes the data is permanently stored on disk. With write-through caching, sometimes referred to as conservative cache mode, writes are written to both the cache and the disk before the write is acknowledged as complete. Write-through caching improves I/O performance with applications that frequently read recently written data.

Caching is a cost-effective way to boost I/O performance. However, unless the RAID controllers doing the caching are configured in dual active pairs and designed with cache coherency and robust recovery mechanisms, caching can cause incorrect data to be delivered to applications and corrupt databases when elements in the I/O path fail.

Ciao,
Greg
Lets Roll!
Kay Behrmann_1
Occasional Contributor

Re: Raid Write Cache corrupts database

Greg,
thank you for your quick answer and the link to the theoretical foundations of caching.
However, I am somewhat shocked that the battery-backed write cache is NOT intended for use in a database server. It was sold to me as a extra add-on to the regular (i.e. non battery-backed) Smart Array Controller to support exactly this! Safe write caching for database servers!
But back to practical life: Does anyone know of any utilities which could be used to check proper operations of the write cache ?
Currently the write cache is switched off. But since inconsistencies appear very rarely anyway, this really does not show anything. And I can't set it back to write caching and risk data loss!
Erwin Zoer
Trusted Contributor

Re: Raid Write Cache corrupts database

Kay/Greg,


If write back caching caused this problem, according to the article this could happen 'when elements in the I/O path fail'.

Do we have any evidence of elements in the I/O path failing? For example a broken controller or disk? If not, I don't see how the article applies to Kay's problem.


Best regards,


Erwin Zoer
Greg Carlson
Honored Contributor

Re: Raid Write Cache corrupts database

Kay,

Where is your storage located? Is it all internal to the DL380 or do you have external storage direct connected to the 5i?

Greg
Lets Roll!
Kay Behrmann_1
Occasional Contributor

Re: Raid Write Cache corrupts database

Greg,
the disks are all internal. SQL is on a RAID 1 pair of 72Gig 10K rpm disks.
I have just found MS article 231619 and run some stress tests on the box. But nothing was found yet.
Any other ideas ?
Kay