Email Subscription Notifications Suspended Temporarily
We are in the process of making navigation in the Servers and Operating Systems forums simpler and more direct. While doing this, we have to temporarily suspend email notifications for subscriptions. If you are subscribed to one or more discussion boards or blogs in the community, please check them daily to see new content. Notifications will be turned back on in a few days. We apologize for any inconvenience this may cause. Thanks, Warren_Admin
Disk Enclosures
cancel
Showing results for 
Search instead for 
Did you mean: 

Problem w/ Raid4SI & DS2100

Greg Philmon_1
Occasional Contributor

Problem w/ Raid4SI & DS2100

Setup:
HP9000 L2000
A5856A RAID 4SI controller
Six DS2100 disk cabinets
Twenty 36GB drives

This equipment is in a managed colocation center with highly redundant environmental controls (power, cooling, etc).

The storage is carved as two logical drives. One is 0+1, the other 0+5. Three hotspares. All filesystems, including boot, are on this external storage.

Yesterday morning both logical drives went offline.

I booted from the install disk and ran the raid configuration tool. The system was reporting SEVEN! failed drives. Three of these were the hotspares, which were all now assigned to a raid array. The three drives for which they had taken over were all marked Ready. But they must have earlier also been marked as Failed, since the hotspares kicked in.

So call it 10 failed drives. Or whatever... suffice it to say that a bunch were marked "Failed" in a very short timeframe, a few minutes max.

These failed drives were spread over at least three of the four SCSI channels on the RAID 4SI controller.

With few options, I just started forcing the "failed" drives to an "online" state, ignoring the warnings about data integrity.

Rebooted and the system seems fine. irconcheck is running now and, at 80% complete, hasn't yet reported any problems.

Questions:
1. What happened?
2. What can I do to ensure it never happens again?

TIA.