1847431 Members
2686 Online
110264 Solutions
New Discussion

Re: LH4r RAID failure

 
Peter Zinckgraf_1
Occasional Contributor

LH4r RAID failure

Hi there,

I've got a very nasty prob here: On one of our LH4r we have a RAID 5 consisting of 5 36 Gig drives connected to the onboard controller. The drive in slot 5 (ID08) failed, leaving the RAID degraded. I inserted an brand new additional disk into slot 6 (ID09) and defined it as a hot spare. Rebuild began, and 20 minutes later, the hot spare failed as well. I replaced the drive which originally failed in the 5th slot with another new disk. Rebuild began, and at 13% this drive switched to fail. I inserted my last new disk into slot 5, and at 13% this one switched to fail, too.

Any hints what to do next HIGHLY appreciated. Our data is hanging on a string.

Thanks,
Peter
2 REPLIES 2
Scott Nadeau
New Member

Re: LH4r RAID failure

We had a Dell server give us false failures a few months ago. Do you happen to have an external device (tape drive) running off the same controller? If so, try removing it and then rebuilding with one of the drives that failed.
Alicia White
Esteemed Contributor

Re: LH4r RAID failure

Rebuilds fail for a variety of reasons. Out of date firmware, problems with one of the active drives (such as media or "other" errors), or even problems with parity data on the array.

I probably don't need to tell you this but make sure you have a VERY GOOD BACKUP of your data before doing any of this trouble-shooting.

The fact that the rebuild seems to fail at the same percentage repeatedly would lead me to think that there is a problem with one of the drives that is still part of the array. If the controller has problems reading the data from that drive, any rebuild will be doomed to failure.

To see if this is the problem, you need to check the properties of all the drives in the array to see if there are any errors. You can do this in the NOS by using NetRAID Assistant (or Novel MegaManager). If you don't have this utility installed, you can check the properties of the drives in NetRAID express tools by pressing control M during POST.

Are there any errors on the drives? If there are errors, then it might be possible to get around the bad blocks by connecting the drives to a regular scsi controller and then running a verify media operation on the drives to remap the bad blocks. But, this COULD make the situation worse by actually corrupting data as good and/or corrupt data is copied from bad blocks to good blocks.

If there are no HW errors, then you might have problem with either the hard drive firmware. It is possible that a communication error caused by out of date firmware is causing the rebuild to fail. If this is the case, then you can update the firmware and try the rebuild again.

You can check HDD firmware under the drive properties. You can download the CD image for the FW update utility with all possible NetServer HDD firmware files at:
http://h20004.www2.hp.com/soar_rnotes/bsdmatrix/matrix65146en_US.html

Run this utility which will tell you if any of the drives need to have their firmware updated. The beauty of downloading the huge CD image file is that there is no guesswork about which of the dozens of FW files you need to download: the utility will find the files on the CD and update accordingly.


If the parity information is bad, there is basically nothing you can do: the parity info the controller uses to rebuild the failed drive is bad so a rebuild will NEVER work. In this case, you have no option other than to back up data, re-format the drives in the array, reinstal the operating system then restore from backup.

I hope this helps.

Alicia