ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

ML350 G4 Smart Array 641 imminent HDD failure.

 
SOLVED
Go to solution
jason blake
Advisor

ML350 G4 Smart Array 641 imminent HDD failure.

Hi Guys.

One of our ML350 servers here with a smart array 641 raid controller is reporting an imminent HDD failure. The red LED is flashing on the HDD to.

Im not too bothered as it is a raid 5 with a hot spare configuration but instead of waiting for it to fail completely if I replace the failing HDD with a new one. Will the controller start to build the data to the replacement HDD in the raid set automatically ?

thanks for any responses.
12 REPLIES
gregersenj
Honored Contributor
Solution

Re: ML350 G4 Smart Array 641 imminent HDD failure.

Yes.

It will start automatic rebuild, as soon as you insert the new disk.
After a few seconds, the on-line LED (The midle one) will start flashing. Wich indicates it's rebuilding.

For safety reasons you must always, ensure, that you got a good backup. Even though it's a Smart Array.

br
/jag
MT19
Valued Contributor

Re: ML350 G4 Smart Array 641 imminent HDD failure.

Another safety measure is to shut the server down first and then replace the failing hard drive. Since the hard drive is not fully failed, you don't want to risk yanking it out while data might be writing to it.
gregersenj
Honored Contributor

Re: ML350 G4 Smart Array 641 imminent HDD failure.

In this case i disagree with Mark.

Since You have got a RAID 5, The disk that you remove will be out of sync, as soon as the RAID controller detect the disk change.
And it will need a rebuild if you put it back.
The Smart Array controller uses meta data, for storing RAID configuaration and status.
The meta data is stored on all the disk.
"same copy on all disks"

If it was a RAID 1, then It would be an extra safety measure.
In this case you would have valid meta data on both disks.

Also lets make perfectly clear, that the Smart Array controller, is designed for Hot swap, Hot add and online expantion etc.

But of cource, things could fail. So I always recommend to ensure that you got a good backup, before messing with disks / disk systems.

cheers
/jag
jason blake
Advisor

Re: ML350 G4 Smart Array 641 imminent HDD failure.

I am planning on doing it this way.
Shut the server down ?
replace failing disk with new one.
Reboot server.

Hopefully the raid controller will detect this HDD change and allow a rebuild to the new disk.

Is the above ok ?

gregersenj
Honored Contributor

Re: ML350 G4 Smart Array 641 imminent HDD failure.

No.

If you're perfectly sure abaoute the RAID level. And the other disks is running ok.

Just yank out the disk, with the server running. And put in the new disk.

With a RAID 5, I see 2 risks' during the replacement operation.

1. Another disk could fail, during the rebuild operation - Then you would have to restore from backup.

2. The new disk, could be defective. Worst case, dould short circuit the SCSI bus - So remove it again.

A reason for not, powering off the server, there is a bigger risk, that another disk might fail.
We see this: Servers has been running for years, then at power cycle, disks and power supplys might fail.

Just a note on your hot spare.
The controller will begin, rebuilding on the hot spare, a few seconds after you have removed the defective disk.
When you put in the new disk. It will abort the hot spare, and start an new rebuild on the new disk.

br
/jag
jason blake
Advisor

Re: ML350 G4 Smart Array 641 imminent HDD failure.

The disk im going to use as the replacement has been used before on a test server.

what is the best way of blanking this disk to make it ok to use as the new disk in the server ?



gregersenj
Honored Contributor

Re: ML350 G4 Smart Array 641 imminent HDD failure.

Just put it in.
gregersenj
Honored Contributor

Re: ML350 G4 Smart Array 641 imminent HDD failure.

Just some additional info.

The meta data, is time stamped. And the most recent meta data is considered to be the valid one.
So. If you power off a server, and put in 2 disks, from 2 servers, both configured for RAID 1, it will rebuild from the newest to the oldest, meaning the oldest will be overwritten.

If you got a running server, and you put in a used disk, as replacement, it will be overwritten.

But if you want to erase a disk. Put it in a server, and delete the array config on the disk. For safety, do it in an used server, with only the disk you want to erase.
This only erases the meta data, and any smart array will treat it as an empty disk.
Of course those companies who make a living from recovering lost data from defective disks, are able to recover the real data.
So if you need to really delete data, you must overwrite, using the common tools.
juan quesada
Respected Contributor

Re: ML350 G4 Smart Array 641 imminent HDD failure.

Hi,

Recommendation: if you have a raid 5 with hot spare and and you need to replace the failed HDD do this with the server off. why? because if you removed the failed HDD (the one in predictive failure) in HOT (with the NOS up an running) the online spare will hit in, so the controller may lock up because the online spare was in rebuilding mode. remember that the best HP practice is to replace the HDD by turning off the server, it is more reliable.

So I stand next to Mark Francisco with his recommendation of shutting down the server

Regards,
jason blake
Advisor

Re: ML350 G4 Smart Array 641 imminent HDD failure.

Thanks for all your responses guys.

I decided to shut the server down. I put the new disk in, selected F1 to start the data recovery process and the data was recovered to the new HDD.

All ok now..

thanks.

juan quesada
Respected Contributor

Re: ML350 G4 Smart Array 641 imminent HDD failure.

great jason,

please remember to assing points to the ones that helped you out
gregersenj
Honored Contributor

Re: ML350 G4 Smart Array 641 imminent HDD failure.

I really don't like to argue, but I like to find out if I'm right or wrong.

Read appendix E, in the Smart Array User guide:
http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c01127202/c01127202.pdf