ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

 
Butterfly1
Collector

Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Hello,

 

I have a HP ProLiant ML350 G5 with 4 hard drives running RAID 5 on the SMART Array E200i Controller.

 

One of my hard drives is reporting this:

 

Physical Hard Drive 2, Serial Number: D2A2P9500J2N0921, Controller Serial Number: QT7BMU0064

Failed

Tue Sep 13 16:28:36 2011 : Controller has reported a critical performance threshold error on this drive

This drive has experienced/recorded error conditions reported by diagnosis and requires replacement. 

Perform the following steps before replacing a drive:

  • Have a known good data backup.
  • Replace the failed drive with one that is the same as, or functionally equivalent to, the original drive.
  • Do not replace the drive if there is another drive that is failed, offline, or in the process of being rebuilt.
  • On non-Hot Plug drives make sure the replacement drive ID matches the original drive ID.
  • For further replacement procedures or help with identifying drive status, see the HP Smart Array Controller User Guide for your controller. The User Guide is available from hp.com.

To identify the failed drive, click on the 'Identify Drive' button below to flash the LED on this drive.

  • Clicking on Identify Drive button will blink the amber LED (if SCSI drive) or turn ON the blue LED (if SAS or SATA drive) for a few seconds.

I am looking all over for a manual that just gives a basic overview of what procedures to follow for my model. I do not have a lot of experience working with these newer RAID sets and hot swaps. And my experiences with RAID Controllers has never been simple so I am just a bit nervous and uncertain.

 

Do I just identify the drive, remove it and slap my new replacement in to let it rebuild? Do I boot into the array controller program? I've been searching all over HP's site for manuals and reading through PDFs and I can't find any references other then how to physically remove the disk. No other steps on initiating a process to rebuild the data or how to watch it's progress, etc.

 

Any help, advice, direction is deeply appreciated.

13 REPLIES
Johan Guldmyr
Honored Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Hello,

with smart array controllers you can just take out the broken disk, wait a bit and put in the replacement. You should not have to boot into the array controller program (with windows/linux you can install HP ACU and get a graphical/cli tool to manage the array controller).
The rebuild should start automagically.

You should have an E200 (unless you've gotten something special). If you want manuals, see if there is one for the E200. The ACU User Guide is pretty good as well. There's quite a few of them.
PZel
Valued Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Whenever the harddisk is continiously burning amber (it is FAILED), then you can pull out the disk without problem and put a new one in (with the same Spare Part Number). After plm. 30 sec the new disk will be blinking green, and the drive is rebuilding, when it's confiured as mirrored or RAID5. (automatically)  After 1 hour (depends on size and configuration) it's green again and working OK.

 

PZ
gregersenj
Honored Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Yes, as statet.

 

Remove failed drive.

waite min 30 sec.

Insert new drive.

 

It will rebuild automatically.

 

If you want to learn aboute the smart array controller, search the web for: HP smart array technology brief.

 

BR

/jag

Butterfly1
Collector

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Hello everyone,

 

Thanks for the feedback and resources. I pulled the drive and put the replacement in. The array auto rebuilt without an issue. All logs looked perfect during and after.

 

Latently, I've had another issue...

 

I pulled that bad drive reporting the bad block and replaced it on Monday, September 13th. (The drive was Hard Drive 2 in Bay 3.)

 

On 8am on Monday September 19th, I noticed that the HpCISSs2 was throwing an error in the Windows Event Logs again. So I started up the HP Array Diagnostic to see that the array was detecting a problem with the drive. This is the NEW drive but again in Bay 3. 

 

By 4:16pm the array took the drive offline:

 

Event Type:Error

Event Source:Cissesrv

Event Category:None

Event ID:24596

Date:9/19/2011

Time:4:16:22 PM

User:N/A

Computer:MAIL

 

Description:A drive failure notification has been received for the SAS physical drive located in bay 3.  This drive can be found in box 1 which is connected to port 1I of the array controller [Embedded].  The failure reason received from the HP Smart Array firmware is: MARK_BAD_FAILED.

 

Once again, I purchased a new drive, had it FedEx'd to me Priority Overnight, took the faulty drive out and put the newly arrived drive in. The array rebuilt, no errors and everything looks great.

 

So now that I've stated all that, here is my actual question:

 

Was it all just bad luck getting the first replacement drive? Or is this an indicator that something else is starting to fail?

 

Just to give you a little more info,  all of the Windows System Logs, the HP Diagnostics, Array Controller, looks perfect. No errors, no warnings, nothing.

 

I would like to think it was just bad luck with the 1st replacement drive and that my Array Controller or backplane isn't acting up. But I want to be proactive to ensure I don't have some impending critical failure in my future.

 

Needless to say, I am watching it like a hawk everyday.

 

~Rachael

PZel
Valued Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Whenever the Proliant Support Pack is installed  on a Windows system,   you also can get a global view of your system in InternetExplorer at https://<servername>:2301. Sign in with a Windows account in the form of User:<domain>\<account> and with that password. You go than into the SystemManagementHome (SMH) page.

 

In there you go to the icon representing the Storage Controller (the E200). In there you click on one of the Physical Drives and you can see directly Sectors Read and Sectors Write. Also, you can see Hard Write and Hard Read Errors. On a good physical drive there are all zero (except Sectors Read/Write of course). Whenever there are Hard Read or Hard Write Errors on a drive, then they are suspect. Sometimes theharddisk is still GREEN (thus: not FAILED) but with errors.

You can also read the Firmware of the Array Controller and the disks.

 

You must take precautions when changing a drive which is not Amber (but GREEN or blinking GREEN) for instance when there are a lot of Hard Write/Read Errors on them:

Then the best way to do is:

1) Locate the drive which is giving problems (<Start> on the SMH page)

2) Power down system

3) Pull out the faulty drive.

4) Power on server

5) Press <F2> whenever the system ask for F1 DisableAllDisks OR F2 FailOnly that drive

6) Wait until the system is up and running (it's RAID5, thus it must working on 3 drives instead of 4: Interim Recovery Mode)

7) Press the new drive in the same slot as the faulty drive and it will rebuild itself

 

With the Firmware CD you can upgrade on-line (but with reboot) the Firmware of your harddisks and array controller.

Firmware CD is on (for ML350G5)

http://h20000.www2.hp.com/bizsupport/TechSupport/DriverDownload.jsp?lang=en&cc=us&prodNameId=3279711&taskId=135&prodTypeId=15351&prodSeriesId=1121586&lang=en&cc=us

Latest levels of hdd's are on:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c00305257

 

 

PZ
theGate
Visitor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Hello

 

I followed exactly the procedure with a system with 3 HD raid 5. I stop the server and replace the 3rd HD, reboot(now with 2 disk online) and I wait for the system up and still nothing after 30 min. I still have a black display.

 

Is it normal?

the system has 3 HD 250GB and I replace the faulty one with a 500GB

 

thanks

Gate

gregersenj
Honored Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

No, that is not normal, unless, you hit the wrong key when promptet by the Smart Array.

 

Why did you power down the server?

I really don't understand, why anyone want to take down a ProLiant, for a hotswap disk replacement.

Hot mean, do it on the run.

 

My guess is, that you have hit F1, when promptet by the SA, during post.

If you power on the server with a bad or missing disk, the smart array will prompt you, to choose if you want to start in interim recovery mode, or if you want to disable the logical drive.

F1 = Disable logical drive.

F2 = Interim recovery mode.

Do nothing = Interim recovery mode.

 

So if you only got one logical drive, with the OS, and you hit F1 - It's never going to boot.

To re-enable the drive, power down the server again. And choose to re-enable the logical drive, when promptet.

If you don't get promptet, boot on a smart statr cd, and choose maintenance > Array Configuration Utlilty, and enable the logical drive from  here.

 

BR

/jag

theGate
Visitor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Hello

 

The Boss here power down the server before talking to me and he maybe click on something! I dont know what exactly he did and as everybody when I ask what he do he answer me 'I do nothing'  !!!

 

Now when I boot on smart cd and click on array configuration it turn around and I cant access the config page.

All HD led blink in a kind of pattern(green led). Now I dont want to turn it off and CTRL-ALT-DEL is disable.

 

 

On start (DOS) the logical drive is indicated as128M and its suppose to be 478GB. Is it suppose to indicated a message about the raid array like healty or corrupted?

Befor I changed the HD 1 I had a message about HD 1 failure.

 

I dont really know this server.  Those ppl are new customer. I start giving them services 1 month ago. 

 

Any idea ?

 

thanks a lot for answering me

 

theGate

 

 

gregersenj
Honored Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Ok, so you're new to ProLiants.

You need to learn some new stuff.

 

First, here's som good links.

Smart Array Configuration:

http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00729544/c00729544.pdf

 

Smart Array Technology brief (The HW stuff):

http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00687518/c00687518.pdf

 

Lots of greate ProLiant stuff:

http://h18000.www1.hp.com/products/servers/technology/whitepapers/proliant-servers.html

 

The flasing LED's are normal. It's used for drive identification. All 3 LED's flash.

If you select the controller, all drives connected to the controller will flash.

If you select an array, all disks in the array will flash.

If you select a logical drive, all disks holding the LD will flash.

If you select a single physical drive, that disk will flash.

 

Beware drive count is zero based.

Failed drives will have a solid amber LED on.

Degraded drives will have a flashing amber LED. A degraded drive is still oprational.

A drive with no LED's on, is standby (Unconfigured or hot spare)

 

All information regarding the RAID setup, drive posistion, and health status, is stored in the RIS area of each disk, called META data.

There is a copy of the meta data on every disk.

 

When a the server is Powered on, bootet or rebootet. the Smart array will scan the disks and pickup the configuration from the meta data. 

 

The SA will inform you aboute status.

How many logical drives it has found.

It will inform you aboute problems, and ask you to respond.

If you don't respond it will pick the best option for you. It will even auto configure, if you got unconfigured drives only, upon boot.

 

Some questions for you:

How many drives do you have?

How are they expected to be configured?

Wich physical drive did you replace?

Wich drive was reportet bad?

 

I'm not sure what you mean with the DOS part, so I can't answer on that.

 

If possible, take some screen shots from acu.

Or generate an ADU report.

 

Most wrong doings can be fixed, for as long you havn't deletet the meta data (Erased configuration) and if you still got the drive, that you have removed.

 

BR

/jag

theGate
Visitor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Hello

 

DOS = Post or Bios boot messages ... sorry about the abuse of languages :)

 

Before going further is that a mean to boot with some live cd tools(from HP or other) to access the logical drive?

If at least I can get the data they want I could rebuilt the computer.

 

For now when I boot with HP smart CD and I diagnose the array, everything seems good.

The size of the array, all disk are good and array is enable.

btw I cant save the report on a USB drive and I dont know why. The USB key is in the computer at boot time.

 

If I boot on HD, after all BIOS messages and the 2 beep I got the option F9, F10 and F12 briefly and nothing happen after that. I got a black screen and even the keyboard dont work !!

 

I still have the faulty drive and I never erase the config array.

 

I took some photos in case and couple of movies

 

thanks a lot

 

Gate

gregersenj
Honored Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Ok DOS is the POST (Power On Self Test).

 

The 128MB is the Cache Memory size on the Smart Array Controller.

It also will tell you how many logical drives is found.

 

It is impotent, tht you understand, what is ment when talking about Array and Logical Drives, on the Smart Array controller.

An Array is a number of physical disks, that is grouped.

The Logical Drive(s), is the drive(s) that is presented to the server (Host).

You can create multible ld's within an array.

Also you can create multible arrays on a Smart Array controller.

The RAID level is defined wen creating the logocal drive(s). So you can have different RAID levels in the same array.

 

Mostly the Arrays is confgured for 1 Array and 1 Logical Disk.

 

Do recheck, using ACU / ADU and Insight Diagnostics, its all on the Smart Start, and if You run the Smart Start from a USB memory Stick, then you save the logs. HP got a greate toll for creating SS on a USB memory stick, you will find it in the Tools/Utility section, along with drivers and FW download.

 

Here's what you need to check.

Logical drives, enabled and OK!

"Boot order" You need to check, that the Smart Array is selected as Boot controller.

 

 

Again!.

What is the number of Physical disks?

What is the RAID level. (RAID 1+0, using 2 disks is RAID 1. HP use the term RAID 1+0)

Wich Disk was Reportet bad?

Wich physical disk did you replace?

Was the Spare disk, brand new and Unused!

 

 

Here's a theoritical senario:

2 Physical disk's

1 Array, using both disks

1 Logical Drive 1+0, using all available space in the Array.

Disk 1 is reportet bad.

Customer shut down server.

You replace the first disk (That would be disk 0 !)

You poweron the server.

Disk 1 reappear as good (This does happen from time to time).

 

Another theoritical Senario (This is far out, but possible).

2 Physical disk's

1 Array, using both disks

1 Logical Drive 1+0, using all available space in the Array.

Disk 1 is reportet bad.

Customer shut down server.

You replace disk 1 (the bad)

The Spare disk has been configured but with no O/S, and it has a newer time stamp, so it will become source disk, and it will copy from the spare to the "old".

 

This is why I don't understand why anybody, choose to do a "cold" replacement of hot swap drives, in a ProLiant server with a Smart Array controller.

 

If you got a RAID 1, and if you could have replaced the wrong disk.

Power down server, remove both disks (Take good note, on wich bay each disk came from, and wich is the spare).

Insert the disk you replaced, put in the Bay it came from. If it start up, leave it in interim recovery mode, and if OS boot, then put in the spare disk. Do be carefull, it could be your last chance.

 

This is why it would like to see some screen shots or ADU log.

 

BR

/jag

 

 

 

Array now rebuild from the "failed" disk, but it may be corrupt OS due to earlier failure. And it doesn't boot!

 

Keep the disk that you pulled out safe!!!

theGate
Visitor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Hello

here the scenario

 

Original system

1 Logical Drive with Raid 5 with 3 HD 250GB for a total of 478GB

 

After discussing with the boss there:

 

1 - The system slow down(really slow)

2 - The boss decided to cold reboot

3 - He call me after that

 

here is my action:

 

4 - I realize that HD bay 1 was faulty

5 - I bought brand new 500GB HD instead of 250GB(hard to find those ones)

6 - I remove the bay #1 HD and replace it when server down

7 - I reboot

8 - I got messages about logical drive 1 not healthy

9 - I had choice to F1 and F2

10 - I choose F2 and I could go further in boot process

11 - the system boot on smart cd

12 - the smart boot detect that drive 1 had to be rebuilt

13 - After 3 days I still had the same messages and click on refresh

14 - The message telling that the drive 1 was rebulting disappear

15 - I exit smart cd

16 - I reboot on HD and got a black screen whitout keyboard

 

thats what had been done

 

I will bring that server this afternoon. It will be easier and faster

I will took other photo amd try to save report

 

Gate

 

 

gregersenj
Honored Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Ok.

 

I suppose the initial cold boot was unsuccessful.

 

Using a larger drive is OK.

 

It is possible, that one of the two other drives, got some hard read errors. If so, then the RAID is gone.

Lets see the ADU report.

On the other hand, if the status of the Logical drive is good, in the Array Configuration Utility, and you don't find any HW errors on the drives, then it is the O/S that has failed.

 

But without logs, its all a guess.

 

BR

/jag