Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

Simultaneous drive failure

SOLVED
Go to solution
Joshua Small_2
Valued Contributor

Simultaneous drive failure

Hi,

Curly one for you. I have an ML110 G3 with two drives in a mirror.

The drives are 75GB SATA drives.

A drive in the array failed on Wednesday. I figured no worries, we'll get a new drive there Monday.

Well HP's "next business day Carepaq" allowed us to get a new drive on the following Monday. The HP tech installed the new drive and left.

Only the array never rebuilt. It didn't rebuild because, immediately after the HP tech left, the "good" disk encountered its first read error, and RAID mirrors don't build from disks with a single read error.

I confirmed the issue was not the card/cabling/motherboard by inserting the failed disk into a workstation, direct attach to SATA port, and testing with manufacturer tools, and both the disks were definitely dead.

So long story short, I have to get TWO new drives (another few days for a paid Carepaq) rebuild the array and restore the lot from backup.

At this point, I'm naturally questioning:

- Has anyone ever had such bad luck (assuming it was luck)
- Could anything occur, such as a fault on the board perhaps, which caused both attached drives to actually fail? We've been running on the two new drives for almost a month now with no issue.
- The Hp Storage Management "periodic rescan" was, according to the logs, running regularly and never finding issue. Is there anything further you can do to prevent these issues?

Please don't bring up firmware or drivers, HP offer very little in regards to updates for these low end servers, and this was definitely updated. The latest HP Storage Manager was uninstalled then reinstalled twice before HP would consider sending the parts.

The real issue is, I'm in a position where I'm going to be running larger arrays and SAN boxes, and I'm curious what plans people have, if any, to combat such scenarios.

I have noted servers such as the HP DL380 offer "RAID6" with two parity drives, but I actually can't see that option on the larger HP SAN and NAS devices.
7 REPLIES
Rob Leadbeater
Honored Contributor
Solution

Re: Simultaneous drive failure

Hi Joshua,

Having two drives fail simultaneously or within a short space of time is quite common, especially if both of those drives were bought at the same time.

If you can, check the serial numbers of the failed drives. I would guess they are close together, and therefore probably part of the same manufacturing batch, which would increase the chances of them failing at around the same time.

You can try and get around it by making sure the drives in your mirrorsets or raidsets are from different batches and/or different manufacturers.

Cheers,

Rob
Joshua Small_2
Valued Contributor

Re: Simultaneous drive failure

Hi Rob,

Makes a lot of sense. I'm not sure how I can source drives from different manufacturers if you're ultimately only quoting an HP part number and asking an HP supplier for that drive.

But this experience does concern me somewhat as I look to running much larger arrays.
Rob Leadbeater
Honored Contributor

Re: Simultaneous drive failure

Indeed... I've been unforunate enough to have two drives fail within a few hours of each other on an EVA5000. Fortunately I didn't lose any critical data, however that was more down to luck than anything else...

Occasionally manufacturers do have glitches in their processes. I remember a few years ago there was a large batch of Fujitsu 146GB 15K rpm disks that were all faulty. We managed to get quite a few of these installed in our HSG80. Luckily because we were using three member mirrorsets, we never lost any data.

Cheers,

Rob
Alain Charland
Occasional Visitor

Re: Simultaneous drive failure

Please note that EVA4400 has initial Cross Vraid (4+2) and future better sets of it.
Patrick Terlisten
Honored Contributor

Re: Simultaneous drive failure

Hello,

AFAIK is Cross Vraid an option which you allows to change the Vraid level from a original Vdisk to a clone or snapshot, for example original Vraid is Vraid 1, Snapshot is Vraid 5. Furthermore uses the EVA 5+1 for Vraid 5. Correct me if I'm wrong. :)

But back on topic:

Fortunately I didn't faced a non-recoverable error while rebuilding a RAID. Maybe Joshua has only bad luck. :-|

Best regards,
Patrick
Best regards,
Patrick
Joshua Small_2
Valued Contributor

Re: Simultaneous drive failure

Yeah luck doesn't seem to go my way sometimes.

I'll be looking into raid ADG on MSA boxes, though it's odd that the bigger EVAs don't offer it.
Patrick Terlisten
Honored Contributor

Re: Simultaneous drive failure

Hello Joshua,

the MTBF of a SATA / FATA drive is much lower then the MTBF of a SCSI / FC drive. I don't think that the missing RAID ADG / 5DP support is a big disadvantage of the EVA. EVA is using RSS (Redundant Storage Sets) for protection. Each RSS is a single RAID protection domain. Vraid 5 Data is stored in a 5+1. Try searching the forum for "RSS", so you will find more informations about it. :)

Best regards,
Patrick
Best regards,
Patrick