ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Proliant ML350 G4 RAID5 failed disk

Edward Luna_1
Occasional Advisor

Proliant ML350 G4 RAID5 failed disk

Hello all

We have a Proliant ML350 G4 server running Windows SBS 2003. We are using RAID 5 on a Smart Array 641 version 2.80 with six 72Gig drives configured as a single logical drive.

Last week disk 5 failed (sold amber) and we received the message to replace the drive. We ordered 2 exact replacement drives (one to keep as a spare)

With the server running we removed the failed drive and replaced (hot swapped) it with one of the new drives. After a few blinks of the activity and online indicators, the drive fault indicator went solid amber (failed). We removed the drive and installed the second new drive but got the same result.

We powered down the server with the second new drive in place and then re-applied power but a few minutes into the boot the fault indicator on the drive went amber. We powered down again and put the first new drive back in and applied power but received the same results indicating a failed drive. That’s 2 new drives and both show drive failure as indicated by the solid amber fault light. These are exact replacement HP drives of the same size as the other 5 drives in the array.

We powered down the server again and just because we had nothing left to try, we put the original drive back in and applied power. We received a message during boot that valid data was found and that drive 5 needed to be rebuilt. F1 to rebuild F2 to continue without rebuild. We selected F1 and all the right indicators came on showing that the drive was being rebuilt. Activity light is blinking, online light is on solid, and Fault Light is off.

After approximately 2 hours the rebuild stops (well... all lights on the drive go out) and nothing else happens. All the while during the “automated rebuild” the Windows Server 2003 splash page was displayed but the server never finishes the boot sequence and all disk activity eventually stops with the splash page remaining. We have repeated the process several times and it always does the same thing except now it says that the automated rebuild is continuing rather than giving us the F1/F2 option.

Sorry for the long wind but this is our Exchange Server and I’m taking some heat… well, okay… I’m taking a lot of heat. Groan.
8 REPLIES
Edward Luna_1
Occasional Advisor

Re: Proliant ML350 G4 RAID5 failed disk

I think it may be that slot 6 itself is bad. I took all 6 drives out of the array and put the 2 new drives in slot 1 and 2 and then configured a new array RAID 1+0. The drives worked fine. These are the same 2 drives that both show solid amber fail when I use them to replace the failed drive originally in slot 6.

So while I had the 2 drives in I installed a new OS and then downloaded the latest firmware for the array. I upgraded from v 2.80 to v 2.84. I then put all the old drives back in and rebooted. As expected I got the message that drive 6 needed to be rebuilt. The rebuild appears to work for a few hours but then fails with no messages.

So is there a way for me to reconfigure the array for only 5 drives instead of 6 without loosing my data?

Thanks
cnb
Honored Contributor

Re: Proliant ML350 G4 RAID5 failed disk

Hi Edward,

Welcome.

There are many threads on this issue. The common theme is that the system, controller and disks must have the latest firmware, drivers AND ACU versions to avoid this issue.

Check the disk drive firmware levels and make sure they are at the latest levels.

Sometimes you get lucky and can shake it loose with the upgrades, other times you'll need to restore from a backup, but this is RAID5 and the volume should be degraded and not failed with 1 suspect disk/slot. Since it won't boot, it sounds like other issues are present, if you have lost 2 in the volume then restore is your only option.

Have you used ADU to see what is going on with the subsystem? Use the latest Smart Start CD:
http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=397642&prodNameId=3279707&swEnvOID=1005&swLang=8&mode=2&taskId=135&swItem=MTX-ff528c1bc8434a01b314485325



Hope this helps,

Edward Luna_1
Occasional Advisor

Re: Proliant ML350 G4 RAID5 failed disk

Thanks... I'll try your suggestions and post the results tomorrow.
cnb
Honored Contributor

Re: Proliant ML350 G4 RAID5 failed disk

Edward Luna_1
Occasional Advisor

Re: Proliant ML350 G4 RAID5 failed disk

I'm making progress... hehehe

I ran the array diagnostics as you suggested and it showed that the drive in question has been rendered too small because of numerous errors on the drive.

Also... although I ordered an exact replacement HP drive, the existing drives are apparently brand X and not an exact match for the HP drives as I had originally thought. The specs are all identical but the array must not like something about the match. My only option at this point is to order the brand X drive and see what happens.

Even though the system is running in degraded mode I managed to get the OS back by doing an in-place repair. I'm considering doing a full system backup (Small Business Server 2003) and then reconfigure the array for 4 drives + spare instead of 6 because I don't really need all that storage anyway.

Iâ ll post my results.

Thank you so much for all your help so far.


cnb
Honored Contributor

Re: Proliant ML350 G4 RAID5 failed disk

Good news indeed!

Yes, you'll need the same drive model and firmware if you want these to work as per HP specifications.


Pls assign points!

;-)

Rgds,
Edward Luna_1
Occasional Advisor

Re: Proliant ML350 G4 RAID5 failed disk

Problem solved...

I stumbled upon a solution by my own shear stupidity. lol

I tried to fix the drive that needed rebuilding but in the process I lost my array configuration. So I wound up doing what I was trying to avoid... I reconfigured the array for a 5 drive RAID 5 and restored the system from backup. Everything is back to normal.

Thanks to everyone for your patience and kind assistance.

Ed
Edward Luna_1
Occasional Advisor

Re: Proliant ML350 G4 RAID5 failed disk

Sorry... I forgot to close the thread.