ProLiant Servers (ML,DL,SL)
Showing results for 
Search instead for 
Did you mean: 

How to handle predictive failure

Ed Dolle
Occasional Advisor

How to handle predictive failure

I have a Proliant 8000 with a drive that is reporting a "predicitive failure".

The array is RAID5 and I nkow that if I replace the drive 'hot' that I will lose data.

My question is - can I power off the server and replace the drive without losing any data?
Trusted Contributor

Re: How to handle predictive failure

Hi, Ed.

You won't loose any data if you replace the drive with a completly clear one and if there's no other failling drive in the array.

Replace it online.
Honored Contributor

Re: How to handle predictive failure

As said, you will not lose data replace one of the members of a RAID 5 while online or off. It is the nature of the beast, Hot Plug.
But before you move forward, a perdictive failure can be triggered by a lot of things.
Verify the consistency of the data before you do anything. This will make sure the RAID data is consistent amongst the drives and correct any errors.
Now depending on the age of the drives you can try and re-verify the flagged drive by removing the Flagged drive and replace it and allow the RAID to rebuild itself. But if the drives are over 5 years old or you notice the drive is really noisy when you remove it, you may want to consider replacing it and all of the members of the RAID over the next period of time, with new drives. They are faster, cooler and more reliable than the old ones.
Hot Swap Hard Drives
Sean Marshall_1
Frequent Advisor

Re: How to handle predictive failure

Technically the drives are HOT-Plug NOT hot swap. If you pull a drive that is not completely failed (in the predictive failed state) you run the risk of data corruption. If the drive is failed (IE. the array controller has identified it as failed and stops writing data to the drive) then it is safe to remove while the server is running.

My advice is the following.

Power off the server - then remove the preditive failure drive. Power up the server with the slot empty, when prompted identify the slot as failed (Press F1 to continue with array disabled or F2 to fail the drive - choose F2 at that point)
Once the O/S comes back up - then insert a new drive.

This will ensure there is 0 chance of data corruption.