1753522 Members
11083 Online
108795 Solutions
New Discussion юеВ

RAID Disk problems

 
Jon_144
New Member

RAID Disk problems

Has anyone encounter a problem where the RAID fail when it try to repair to a spare drive? I have heard from different vendor that companies that rea running any RAIDS should run a consistency check. It this a universal standard tool to use on any RAID Disk?
7 REPLIES 7
Ian Miller.
Honored Contributor

Re: RAID Disk problems

I'm unaware of a universal tool. Are you using RAID software for VMS or RAID disks or ?
____________________
Purely Personal Opinion
Jon_144
New Member

Re: RAID Disk problems

Sorry to confuse you but I was wondering if consistency check is a standard tool that other companies is using to make sure the RAID does not fail when you do a repair. Hope this make sense.
Uwe Zessin
Honored Contributor

Re: RAID Disk problems

A consistency checking feature is not implemented in all RAID controllers.

We have a customer whose air condition failed. That resulted in a rapid death of 3 disks in a RAID-5 set, which resulted in a complete failure.

No amount of spare disks help if too many disks die before the redundany is restored. No amount of consistency check would have helped, either.

Consistency checks are rather useful to test the reliability of the RAID implementation or if some other event happened like data loss on a writeback cache.

If a RAID rebuild somehow fails, then the controller must eject the bad spare disk.
.
Mike Naime
Honored Contributor

Re: RAID Disk problems

Jon:

Can you be a bit more specific? I bet that you are talking about a non HP/Compaq storage controller.

If your Raid Array controller does the consistency check for you, Why should I have to run an external tool? Maybe because their product is @#$&*@& and doesn't work with VMS like the stuff developed by the old DEC/Compaq/HP engineers.

In the past, I had the following experience:
We had an HSG80 raidset fail out a drive. The spareset drive automatically spared in and started re-building. At about 30% on the rebuild, the new drive also failed. We had to physically pull the new drive. Another disk failed into the failed position and this drive finished the rebuild process. NO DATA WAS LOST!!!

With 100TB of HSG80 SAN storage, we loose on average one disk per week from a raid/mirrorset. I have yet to restore from tape because we lost a raidset. This includes the times that we lost an entire channel in the Blue Bricks, and had to replace a shelf in the EMA's.

Mike
VMS SAN mechanic
Keith Parris
Trusted Contributor

Re: RAID Disk problems

One way to protect against such a RAID array failure is to use host-based Volume Shadowing to shadow to another disk [array].

I just came across an interesting white paper from The Uptime Institute. One of the interesting conclusions of the paper was that because of various inherent problems, even the best datacenters by themselves can provide at best 99.99% (4 nines) uptime in actual (measured) practice. To reach higher availability (i.e. 5 nines) implies
you must have more than one datacenter, in a redundant configuration. See "Industry Standard Tier Classifications Define Site Infrastructure Performance" at http://upsite.com/TUIpages/whitepapers/tuitiers.html
Uwe Zessin
Honored Contributor

Re: RAID Disk problems

It might be interesting to note that HBVS has sort-of consistency checking in the latest version:
$ set shadow /demand_merge

I was able to abuse that feature in a customer demonstration to generate some load on the SAN.
.
Keith Parris
Trusted Contributor

Re: RAID Disk problems

Host-Based Volume Shadowing now has the $ANALYZE/DISK/SHADOW command to check the consistency of data on the shadowset members. (For earlier versions, there was the set of 2 programs CHECK_2MBR_SHADOW_SET_ALPHA.EXE and CHECK_3MBR_SHADOW_SET_ALPHA.EXE available from the support center.)

And Host-Based RAID Software has the $RAID ANALYZE command.