ProLiant Servers (ML,DL,SL)
1752779 Members
6196 Online
108789 Solutions
New Discussion

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

 
Butterfly1
Collector

Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Hello,

 

I have a HP ProLiant ML350 G5 with 4 hard drives running RAID 5 on the SMART Array E200i Controller.

 

One of my hard drives is reporting this:

 

Physical Hard Drive 2, Serial Number: D2A2P9500J2N0921, Controller Serial Number: QT7BMU0064

Failed

Tue Sep 13 16:28:36 2011 : Controller has reported a critical performance threshold error on this drive

This drive has experienced/recorded error conditions reported by diagnosis and requires replacement. 

Perform the following steps before replacing a drive:

  • Have a known good data backup.
  • Replace the failed drive with one that is the same as, or functionally equivalent to, the original drive.
  • Do not replace the drive if there is another drive that is failed, offline, or in the process of being rebuilt.
  • On non-Hot Plug drives make sure the replacement drive ID matches the original drive ID.
  • For further replacement procedures or help with identifying drive status, see the HP Smart Array Controller User Guide for your controller. The User Guide is available from hp.com.

To identify the failed drive, click on the 'Identify Drive' button below to flash the LED on this drive.

  • Clicking on Identify Drive button will blink the amber LED (if SCSI drive) or turn ON the blue LED (if SAS or SATA drive) for a few seconds.

I am looking all over for a manual that just gives a basic overview of what procedures to follow for my model. I do not have a lot of experience working with these newer RAID sets and hot swaps. And my experiences with RAID Controllers has never been simple so I am just a bit nervous and uncertain.

 

Do I just identify the drive, remove it and slap my new replacement in to let it rebuild? Do I boot into the array controller program? I've been searching all over HP's site for manuals and reading through PDFs and I can't find any references other then how to physically remove the disk. No other steps on initiating a process to rebuild the data or how to watch it's progress, etc.

 

Any help, advice, direction is deeply appreciated.

13 REPLIES 13
Johan Guldmyr
Honored Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Hello,

with smart array controllers you can just take out the broken disk, wait a bit and put in the replacement. You should not have to boot into the array controller program (with windows/linux you can install HP ACU and get a graphical/cli tool to manage the array controller).
The rebuild should start automagically.

You should have an E200 (unless you've gotten something special). If you want manuals, see if there is one for the E200. The ACU User Guide is pretty good as well. There's quite a few of them.
PZel
Trusted Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Whenever the harddisk is continiously burning amber (it is FAILED), then you can pull out the disk without problem and put a new one in (with the same Spare Part Number). After plm. 30 sec the new disk will be blinking green, and the drive is rebuilding, when it's confiured as mirrored or RAID5. (automatically)  After 1 hour (depends on size and configuration) it's green again and working OK.

 

PZ
gregersenj
Honored Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Yes, as statet.

 

Remove failed drive.

waite min 30 sec.

Insert new drive.

 

It will rebuild automatically.

 

If you want to learn aboute the smart array controller, search the web for: HP smart array technology brief.

 

BR

/jag

Accept or Kudo

Butterfly1
Collector

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Hello everyone,

 

Thanks for the feedback and resources. I pulled the drive and put the replacement in. The array auto rebuilt without an issue. All logs looked perfect during and after.

 

Latently, I've had another issue...

 

I pulled that bad drive reporting the bad block and replaced it on Monday, September 13th. (The drive was Hard Drive 2 in Bay 3.)

 

On 8am on Monday September 19th, I noticed that the HpCISSs2 was throwing an error in the Windows Event Logs again. So I started up the HP Array Diagnostic to see that the array was detecting a problem with the drive. This is the NEW drive but again in Bay 3. 

 

By 4:16pm the array took the drive offline:

 

Event Type:Error

Event Source:Cissesrv

Event Category:None

Event ID:24596

Date:9/19/2011

Time:4:16:22 PM

User:N/A

Computer:MAIL

 

Description:A drive failure notification has been received for the SAS physical drive located in bay 3.  This drive can be found in box 1 which is connected to port 1I of the array controller [Embedded].  The failure reason received from the HP Smart Array firmware is: MARK_BAD_FAILED.

 

Once again, I purchased a new drive, had it FedEx'd to me Priority Overnight, took the faulty drive out and put the newly arrived drive in. The array rebuilt, no errors and everything looks great.

 

So now that I've stated all that, here is my actual question:

 

Was it all just bad luck getting the first replacement drive? Or is this an indicator that something else is starting to fail?

 

Just to give you a little more info,  all of the Windows System Logs, the HP Diagnostics, Array Controller, looks perfect. No errors, no warnings, nothing.

 

I would like to think it was just bad luck with the 1st replacement drive and that my Array Controller or backplane isn't acting up. But I want to be proactive to ensure I don't have some impending critical failure in my future.

 

Needless to say, I am watching it like a hawk everyday.

 

~Rachael

PZel
Trusted Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Whenever the Proliant Support Pack is installed  on a Windows system,   you also can get a global view of your system in InternetExplorer at https://<servername>:2301. Sign in with a Windows account in the form of User:<domain>\<account> and with that password. You go than into the SystemManagementHome (SMH) page.

 

In there you go to the icon representing the Storage Controller (the E200). In there you click on one of the Physical Drives and you can see directly Sectors Read and Sectors Write. Also, you can see Hard Write and Hard Read Errors. On a good physical drive there are all zero (except Sectors Read/Write of course). Whenever there are Hard Read or Hard Write Errors on a drive, then they are suspect. Sometimes theharddisk is still GREEN (thus: not FAILED) but with errors.

You can also read the Firmware of the Array Controller and the disks.

 

You must take precautions when changing a drive which is not Amber (but GREEN or blinking GREEN) for instance when there are a lot of Hard Write/Read Errors on them:

Then the best way to do is:

1) Locate the drive which is giving problems (<Start> on the SMH page)

2) Power down system

3) Pull out the faulty drive.

4) Power on server

5) Press <F2> whenever the system ask for F1 DisableAllDisks OR F2 FailOnly that drive

6) Wait until the system is up and running (it's RAID5, thus it must working on 3 drives instead of 4: Interim Recovery Mode)

7) Press the new drive in the same slot as the faulty drive and it will rebuild itself

 

With the Firmware CD you can upgrade on-line (but with reboot) the Firmware of your harddisks and array controller.

Firmware CD is on (for ML350G5)

https://support.hpe.com/hpesc/public/home

Latest levels of hdd's are on:

https://support.hpe.com/hpesc/public/docDisplay?docId=mmr_kc-0128606

 

[Note: broken link updated/removed by Mod]

PZ
theGate
Visitor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Hello

 

I followed exactly the procedure with a system with 3 HD raid 5. I stop the server and replace the 3rd HD, reboot(now with 2 disk online) and I wait for the system up and still nothing after 30 min. I still have a black display.

 

Is it normal?

the system has 3 HD 250GB and I replace the faulty one with a 500GB

 

thanks

Gate

gregersenj
Honored Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

No, that is not normal, unless, you hit the wrong key when promptet by the Smart Array.

 

Why did you power down the server?

I really don't understand, why anyone want to take down a ProLiant, for a hotswap disk replacement.

Hot mean, do it on the run.

 

My guess is, that you have hit F1, when promptet by the SA, during post.

If you power on the server with a bad or missing disk, the smart array will prompt you, to choose if you want to start in interim recovery mode, or if you want to disable the logical drive.

F1 = Disable logical drive.

F2 = Interim recovery mode.

Do nothing = Interim recovery mode.

 

So if you only got one logical drive, with the OS, and you hit F1 - It's never going to boot.

To re-enable the drive, power down the server again. And choose to re-enable the logical drive, when promptet.

If you don't get promptet, boot on a smart statr cd, and choose maintenance > Array Configuration Utlilty, and enable the logical drive from  here.

 

BR

/jag

Accept or Kudo

theGate
Visitor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Hello

 

The Boss here power down the server before talking to me and he maybe click on something! I dont know what exactly he did and as everybody when I ask what he do he answer me 'I do nothing'  !!!

 

Now when I boot on smart cd and click on array configuration it turn around and I cant access the config page.

All HD led blink in a kind of pattern(green led). Now I dont want to turn it off and CTRL-ALT-DEL is disable.

 

 

On start (DOS) the logical drive is indicated as128M and its suppose to be 478GB. Is it suppose to indicated a message about the raid array like healty or corrupted?

Befor I changed the HD 1 I had a message about HD 1 failure.

 

I dont really know this server.  Those ppl are new customer. I start giving them services 1 month ago. 

 

Any idea ?

 

thanks a lot for answering me

 

theGate

 

 

gregersenj
Honored Contributor

Re: Procedure for hot swapping a drive on a HP ProLiant ML350 G5 with RAID 5

Ok, so you're new to ProLiants.

You need to learn some new stuff.

 

First, here's som good links.

Smart Array Configuration:

https://support.hpe.com/hpesc/public/home

 

Smart Array Technology brief (The HW stuff):

https://support.hpe.com/hpesc/public/home

 

Lots of greate ProLiant stuff:

https://support.hpe.com/hpesc/public/home

 

The flasing LED's are normal. It's used for drive identification. All 3 LED's flash.

If you select the controller, all drives connected to the controller will flash.

If you select an array, all disks in the array will flash.

If you select a logical drive, all disks holding the LD will flash.

If you select a single physical drive, that disk will flash.

 

Beware drive count is zero based.

Failed drives will have a solid amber LED on.

Degraded drives will have a flashing amber LED. A degraded drive is still oprational.

A drive with no LED's on, is standby (Unconfigured or hot spare)

 

All information regarding the RAID setup, drive posistion, and health status, is stored in the RIS area of each disk, called META data.

There is a copy of the meta data on every disk.

 

When a the server is Powered on, bootet or rebootet. the Smart array will scan the disks and pickup the configuration from the meta data. 

 

The SA will inform you aboute status.

How many logical drives it has found.

It will inform you aboute problems, and ask you to respond.

If you don't respond it will pick the best option for you. It will even auto configure, if you got unconfigured drives only, upon boot.

 

Some questions for you:

How many drives do you have?

How are they expected to be configured?

Wich physical drive did you replace?

Wich drive was reportet bad?

 

I'm not sure what you mean with the DOS part, so I can't answer on that.

 

If possible, take some screen shots from acu.

Or generate an ADU report.

 

Most wrong doings can be fixed, for as long you havn't deletet the meta data (Erased configuration) and if you still got the drive, that you have removed.

 

BR

/jag

 

[Note: broken link updated/removed by Mod]

Accept or Kudo