Disk Enclosures
1752777 Members
6187 Online
108789 Solutions
New Discussion юеВ

XP Array Users: Have you ever had catastrophic 7+1P Parity Group Failures?

 
Alzhy
Honored Contributor

XP Array Users: Have you ever had catastrophic 7+1P Parity Group Failures?

In any parity (RAID5) configs - loss of one disk member results in the raid group staying up but with degraded performance. And as the number of disks in a parity group increases - so does the likelihood of more than one disk failing at the same time - rendering the raid group dead.

We're trying to decide, a 3+1P RAID5 config (75% Efficiency) OR 7+1P RAID5 config (87% Efficiency - but with a higher degree of 1+ disk failures). So my question is - since the advent of 7+1P implementations on the XP and despite the availability of global spares - has anyone experienced more than one disk fail in 1 7+1P RAID5 array group on an XP model?

Thanks!
Hakuna Matata.
3 REPLIES 3
Sanjay Kumar Suri
Honored Contributor

Re: XP Array Users: Have you ever had catastrophic 7+1P Parity Group Failures?

We are only using 3+1P configuration. I think 7+1P:

- Only supported on XP 1024 and XP 128 with 2 ACP pairs.

- They must be in the same DKU, same slot number but different ACP pairs.

sks
A rigid mind is very sure, but often wrong. A flexible mind is generally unsure, but often right.
Vincent Fleming
Honored Contributor

Re: XP Array Users: Have you ever had catastrophic 7+1P Parity Group Failures?

Nelson,

I have a number of arrays at customers that are running 7D+1P, and they haven't seen any double failures.

Double failures are pretty rare - though they do happen from time to time. I have actually seen it happen once - in a 3D+1P array group, but the customer was running the XP over temperature for some time (the room was *very* under cooled). The failure rates of drives increases exponentionally with high temperatures.

Also, your exposure is not as bad as you think. The XP proactively spares drives that are failing. The XP checks the error rates on the drives regularly, and if any drive exceeds the set thresholds, it will spare it out, and then automatically fail the drive.

The spare is _mirrored_ to the failing drive in this operation - it is not rebuilt from parity. This significantly reduces the time it takes to spare out the drive, and prevents the array group from entering Degraded Mode.

This is one reason why double failures are rare - the drives would have to fail completely, and unexpectedly - a rare event in a single array group.

You can actually have more than one drive proactively spare out in the same array group at the same time, provided that you have enough spares.

I think you would be safe with 7D+1P.

Good luck,

Vince


No matter where you go, there you are.
Alzhy
Honored Contributor

Re: XP Array Users: Have you ever had catastrophic 7+1P Parity Group Failures?

I think I totally agree with you Vincent.. our 9960 never missed a heart beat since 1999. A Hitachi Engineer would just show up and change a failed drive and we never notice anything and have never had any dual drive failures in a single parity/array group.

I actually got hold of a technical paper describing the HDS' disk protection mechanisms that is pretty much similar to the SMART mechanism found on other IDE, ATA and SCSI disks. treshhold for practically every degradable part on the disk is constantly tested and when those tresholds fall to a certain minnimum - the array firmware spared it out.

Thanks. I think I can accept a 7+1P configuration now.
Hakuna Matata.