Disk Enclosures
Showing results for 
Search instead for 
Did you mean: 

Problem w/ Raid4SI & DS2100

Greg Philmon_1
Occasional Contributor

Problem w/ Raid4SI & DS2100

HP9000 L2000
A5856A RAID 4SI controller
Six DS2100 disk cabinets
Twenty 36GB drives

This equipment is in a managed colocation center with highly redundant environmental controls (power, cooling, etc).

The storage is carved as two logical drives. One is 0+1, the other 0+5. Three hotspares. All filesystems, including boot, are on this external storage.

Yesterday morning both logical drives went offline.

I booted from the install disk and ran the raid configuration tool. The system was reporting SEVEN! failed drives. Three of these were the hotspares, which were all now assigned to a raid array. The three drives for which they had taken over were all marked Ready. But they must have earlier also been marked as Failed, since the hotspares kicked in.

So call it 10 failed drives. Or whatever... suffice it to say that a bunch were marked "Failed" in a very short timeframe, a few minutes max.

These failed drives were spread over at least three of the four SCSI channels on the RAID 4SI controller.

With few options, I just started forcing the "failed" drives to an "online" state, ignoring the warnings about data integrity.

Rebooted and the system seems fine. irconcheck is running now and, at 80% complete, hasn't yet reported any problems.

1. What happened?
2. What can I do to ensure it never happens again?