ProLiant Servers - Netservers

HP NetServer LT6000r NetRaid Raid5 array rebuild error

 
SOLVED
Go to solution
Steven R. Johnsen
New Member

HP NetServer LT6000r NetRaid Raid5 array rebuild error

I have a NetServer LT6000r with an internal NetRaid controller which gives an error on rebuild of a failed member drive in a Raid5 array.

The Raid5 array consists of (4) 18.2 GB physical drives which created a logical drive size of 52GB. All drives were used as data space and there is no hot spare.

The physical drive # 0 channel 0 failed. I attempted a rebuild after swapping out the failed drive with a spare. I formatted the spare, then started the rebuild. The rebuild fails consistently at 56% completion. I have tried three different drives with the same results. One thing I noticed when checking properties of all the member drives is that drive #3 has (2) media errors.

Is this the reason the array will not re-build?

My question is if I pull drive #3 out (the one with the 2 media errors) and replace it, will I loose any data? The Raid5 configuration is in degraded mode and it continues to run on the three remaining drives. Will the logical drive run on only two physical drives?

Is the correct recovery procedure to first replace the #3 drive with the media errors, do a rebuild then replace the failed #0 drive and do a second rebuild?

Will the above work, or am I already at the point of not being able to rebuild this array?

One additional fact, the above array is partitioned as the "system" volume and the "ora-home" volume containing the OS and Oracle home respectivly, so my only other option is to image the volumes and re-create the array if nothing else works, so I would prefer an easier solution.

Thanks

Steve
1 REPLY 1
kris rombauts
Honored Contributor
Solution

Re: HP NetServer LT6000r NetRaid Raid5 array rebuild error

Steve,

there is no way you can pull out another disk since the raid array is already in degraded mode, if you do so you will loose all data. A RAID5 array build with 4 disks needs a minimum of three disks to survive.

The best option is to backup the data and reinstall the system, or image it.


The reason for the rebuild to fail is as you figured out, a problem in another stripe of the raid5 data such that the controller is not capable to reconstruct the full 4 disk wide stripe anymore. There is a chance that your data itself is not affected and that it is "only" a bad spot in an area that currently holds no user data, but the controller is not capable of knowning that since the RAID5 and the user data are abstracted from each other so to speak, this is a industry wide 'problem' and not specific to the Netraid controller.

If you run a full backup which reads all the data on the disks and that is succesfull, you are almost certain the bad spot holds no user data (if it does, then the backup should fail on a certain folder or file). If the backup fails on a certain file, try to exclude it from the backup and see if that helps you through the full backup. If it fixes it, then great and you were lucky to have spotted the issue at the file system level.

As a good practise and to minimize the issue you ran into (rebuild fails after a disk failure due to another issueon the stripe) it's to make sure the Netraid consistency check is scheduled to run and check the raid parity stripe on weekly (default schedule) basis. This corrects eventual raid stripe inconsistencies in area's that are almost never accessed i.e.

The Netraid monitor/consistency check is downloadable from here:

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=296140&prodTypeId=329290&prodSeriesId=51930&swLang=13&taskId=135&swEnvOID=1005


HTH

Kris