ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Proliant ML150 smart drive failure

 
jason91
Occasional Advisor

Proliant ML150 smart drive failure

 

Hi, we had a power outage in one of our branches in china and the server can't boot.

Here is the message on the boot screen:

"Smart checked failed" on two of our three RAID drives.. (RAID1 I think)

I don't know if I should try to recover the data out of the first HDD with a data recovery program.. or try to rebuild the array? 

This happened probably because of multiple power outages, the power went out 3 times and the ups got fried during the first power-off.

Can you please share some advice for this case... the backups are from last week but it would be very encouraging if there was something I could do to recover more recent data.

10 REPLIES
Johan Guldmyr
Honored Contributor

Re: Proliant ML150 smart drive failure

Hi,

If it was a raid1 it should be possible to get it back/up and running with only one hdd, this might depend on if it's the first or second hdd as you are running the software RAID. I'd check out the user guide for the raid controller, maybe it has some valuable information about what to do in a scenario like this. But first, check if you have raid1 or if you got something else..
jason91
Occasional Advisor

Re: Proliant ML150 smart drive failure

thank you very much for your reply, I'm using the HP embedded SATA RAID controller.. any ideas to try and rebuild the raid1 array? or is this not advised?

 

the server can load into safe mode but I tried to backup the data and I am getting an error in NTBackup.. seems like there are metadata problems? or file integrity problems?

 

I wish the RAID controller had a baterry to save stuff in cases when power goes down like in my case... :(

Johan Guldmyr
Honored Contributor

Re: Proliant ML150 smart drive failure

How is the array set up if you have 3 drives in a RAID1?

Some hdd smart array controllers have a battery (bbwc).

Fora rebuild I believe you need to assign the disk as a hot spare.
jason91
Occasional Advisor

Re: Proliant ML150 smart drive failure

Thank you for the reply, i believe it was a RAID10 array, and the server was using the lowest common drive size (250G) for the array, so it was only showing 500GB usable data in the O/S.

 

one of the local employees in that office said that there was a message for a "degraded array state" in the past months and they kept pressing "enter" for this warning. So apparently the array was already in degraded state

 

the pdf of the embedded sata controller is here for your reference:

 

http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00771065/c00771065.pdf

 

Any thoughts would be highly appreciated, thank you!

jason91
Occasional Advisor

Re: Proliant ML150 smart drive failure

sorry I didn't mention, there is no bbwc/battery in this controller (its the hp embedded intel ICH9R controller)
Johan Guldmyr
Honored Contributor

Re: Proliant ML150 smart drive failure

Both RAID10 and RAID1 an even amount of drives.

"RAID 10 is a stripe of mirrors." from Wikipedia.

If you had 500GB usable data then the RAID10 needs to have been with the 500GB ones. The 250GB wasn't used at all? Or maybe the OS was only on that one and the data was on the 500GB RAID1? You should still find out exactly what type of RAID setup has been used.
jason91
Occasional Advisor

Re: Proliant ML150 smart drive failure

The only info I could get was that the O/S could only see 250GB .. and it turns out it was only two physical drives, not three. they just had three logical drives configured in the O/S..
Johan Guldmyr
Honored Contributor

Re: Proliant ML150 smart drive failure

Maybe scandisk or filesystem check would be a good idea to run on the disk, maybe that would clear up some of the problems on the disk..
jason91
Occasional Advisor

Re: Proliant ML150 smart drive failure

Thank you, will try these. If there are any other ideas please let me know

 

I was thinking of trying spinrite too.

Matti_Kurkela
Honored Contributor

Re: Proliant ML150 smart drive failure

OK, this is a Proliant ML150, but is it ML150 G3, ML150 G5 or ML150 G6?

 

According to the image you posted, you have 2 disks, not 3. (The boot screen mentions disk #02 twice, first in a standard informational listing, then inside the WARNING: message.)

 

The disk in SATA port #02 (with serial number GB0500EAFJH) indicates SMART failure, i.e. its internal diagnostics indicates it will fail soon, but it may not have actually failed yet.  Even so, if the OS still runs, you should immediately make a full backup and start the process for getting a new disk for this system.

 

The disk in SATA port #00 is OK.

 

The image you posted does not indicate the presence of any RAID sets at all: since the disks are not the same size, it is possible they are not configured for RAID at all: they might be used as stand-alone disks. From the image only, it is impossible to know if the RAID features are used at all or not.

 

You should verify your RAID configuration. Your OS should still be bootable by pressing Enter when you are at the boot screen.

 

Here's the Embedded SATA RAID User Guide.

Pages 9-13 describe the HRCONF command-line utility: please use it to display the RAID configuration and paste the output to this thread.

hrconf getconfig 0 AL

hrconf getconfig 0 AL >raidconfig.txt

 The first command line should display the RAID configuration in the command prompt window; the second should produce a raidconfig.txt file that contains the same information.

 

If you don't have this tool installed, see http://www.hp.com/go/sataraid

 

If it turns out you have two configured arrays with 1 disk each, that is not a fault-tolerant RAID1 configuration: however, it might still be possible to add another disk to the system, add it as a hotspare for the disk in port #02, and then begin a rebuild (= copy all data from the disk in port #02 to the new disk). The instructions for that are in the User Guide linked above: see pages 7 and 8.

 

If both disks belong to the same RAID set, the situation is not quite so critical: in this case, the other disk contains a 100% healthy copy of your data, so you'll only need to replace the failing disk and rebuild the array.

MK