ProLiant Servers (ML,DL,SL)
1752800 Members
5944 Online
108789 Solutions
New Discussion юеВ

Predictive Failure - Actions to take?

 
SOLVED
Go to solution
Miriam Haber
Occasional Advisor

Predictive Failure - Actions to take?

We have a database serving our website that has a RAID 5 array. One of the drives is showing "Predictive Failure" status in Compaq Insight Manager.

Since we cannot take this server down, it seems the only thing to do is wait for the drive to fail and then hot-swap it.

Or is there a way to remove that drive from the RAID array before it fails? Thanks!
10 REPLIES 10
Mark Cloutier
Respected Contributor
Solution

Re: Predictive Failure - Actions to take?

Dear Miriam,
You are right. Even though the drive has been marked as Pre-Failure, you still have to shut down the server to replace it. There is no way around this until the drive has failed.
To protect your data you might want to look at having an on-line spare for the RAID.

Mark
We are here for a good time, not a long time!
Jim Colley_1
Occasional Contributor

Re: Predictive Failure - Actions to take?

If you have a warranty on the system, you should call for a new drive now. The predictive failure message is all you need to order a new drive. Then the drive is in your hand when the old one fails.
Doug de Werd
HPE Pro

Re: Predictive Failure - Actions to take?

If the drive is connected to a Smart Array Controller, you should not have to shut down the system to make the change (such are the virtues of hot-plug drives!). Once you have the spare drive, simply pull the other drive out (which in effect will "fail" it) and plug in the replacement. The RAID set will automatically rebuild in the background.

However, a couple of things to remember - first, ALWAYS have a backup of your data before you do this, and second, remember that during the rebuild, you do not have RAID protection until the rebuild is complete, For this reason, you may want to schedule the rebuild during off hours (also, the speed of the rebuild is dependent on how much regular disk I/O is occurring, so if there is little or no disk I/O, then the rebuild will complete faster).

This applies to Smart Array Controllers - if you have another type of controller, then it might not work the same way.

Thanks,
Doug
I am an HPE employee
Accept or Kudo
Michael Grech
Advisor

Re: Predictive Failure - Actions to take?

Could Mark describe the circumstances that make his answer correct? Like Doug mentions, I have changed drives on Smart Array Controllers using RAID 5 that were still working. I have not had to shut down the system, wait for the drive to fail, or even reboot the server when swapping the drive. But if there are times I need to shut it down or wait for the failure I want to know when.
Miriam Haber
Occasional Advisor

Re: Predictive Failure - Actions to take?

Thank you for the feedback. Since this is a production server for our company's website, I have taken the cautious route and let it continue running with the "failing" drive in place. Oddly, though the drive is in "predictive failure" mode, it has not yet failed.

I called Compaq tech support and they said I may be getting a false alert. Uusually, the drive would fail within 24 hours of the "Predictive Failure" status message. There is a patch I need to add to prevent false alarms. However, since the patch also requires a reboot, it will have to wait until the next scheduled maintenance.
Mark Cloutier
Respected Contributor

Re: Predictive Failure - Actions to take?

For clarification:
If Insight Manager shows a drive to be in Pre-failure, the server has to be shut down and the drive can be removed/replaced. The Array Controller still sees the drive as a good working drive and will continue to access the drive. If you remove the drive when the Array controller is striping data to it then you might encounter corrupt data. Therefore if the light has NOT changed on the drive to show it as bad then shut down the Server to remove it.
Many many people do just remove it when Insight Manager repors it as Pre-failure. This could corrupt date or even the RAID set.
Does this explaination help?

Mark
We are here for a good time, not a long time!
Michael Grech
Advisor

Re: Predictive Failure - Actions to take?

Marks clarification helps. Thanks. If the RAID is setup with an on-line spare, will the spare become active when one drive shows "Predictive Failure" or will it wait until it fails?
Mark Cloutier
Respected Contributor

Re: Predictive Failure - Actions to take?

Unfortunately no.
The on-line spare will wait for the drive to fail before it becomes active.

Mark
We are here for a good time, not a long time!
Thomas Hoberg
Advisor

Re: Predictive Failure - Actions to take?

No reason why you shouldn't be able to unplug the pre-failure drive and replace it.

As a matter of fact, we've done this on quite a few servers to replace 18GB7.2K drives with 36GB10K drives. After all drives had been replaced we wound up with lots of extra space on the array. We then either created extra partitions, spanned partions or resized partitions with Partition Magic (all require reboots, unfortunately)