- Community Home
- >
- Servers and Operating Systems
- >
- HPE ProLiant
- >
- ProLiant Servers (ML,DL,SL)
- >
- Predictive Failure - Actions to take?
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-02-2002 07:21 AM
тАО12-02-2002 07:21 AM
Since we cannot take this server down, it seems the only thing to do is wait for the drive to fail and then hot-swap it.
Or is there a way to remove that drive from the RAID array before it fails? Thanks!
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-02-2002 11:00 AM
тАО12-02-2002 11:00 AM
SolutionYou are right. Even though the drive has been marked as Pre-Failure, you still have to shut down the server to replace it. There is no way around this until the drive has failed.
To protect your data you might want to look at having an on-line spare for the RAID.
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-02-2002 12:25 PM
тАО12-02-2002 12:25 PM
Re: Predictive Failure - Actions to take?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-03-2002 07:19 AM
тАО12-03-2002 07:19 AM
Re: Predictive Failure - Actions to take?
However, a couple of things to remember - first, ALWAYS have a backup of your data before you do this, and second, remember that during the rebuild, you do not have RAID protection until the rebuild is complete, For this reason, you may want to schedule the rebuild during off hours (also, the speed of the rebuild is dependent on how much regular disk I/O is occurring, so if there is little or no disk I/O, then the rebuild will complete faster).
This applies to Smart Array Controllers - if you have another type of controller, then it might not work the same way.
Thanks,
Doug

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-04-2002 06:44 AM
тАО12-04-2002 06:44 AM
Re: Predictive Failure - Actions to take?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-04-2002 06:48 AM
тАО12-04-2002 06:48 AM
Re: Predictive Failure - Actions to take?
I called Compaq tech support and they said I may be getting a false alert. Uusually, the drive would fail within 24 hours of the "Predictive Failure" status message. There is a patch I need to add to prevent false alarms. However, since the patch also requires a reboot, it will have to wait until the next scheduled maintenance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-04-2002 07:02 AM
тАО12-04-2002 07:02 AM
Re: Predictive Failure - Actions to take?
If Insight Manager shows a drive to be in Pre-failure, the server has to be shut down and the drive can be removed/replaced. The Array Controller still sees the drive as a good working drive and will continue to access the drive. If you remove the drive when the Array controller is striping data to it then you might encounter corrupt data. Therefore if the light has NOT changed on the drive to show it as bad then shut down the Server to remove it.
Many many people do just remove it when Insight Manager repors it as Pre-failure. This could corrupt date or even the RAID set.
Does this explaination help?
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-04-2002 07:14 AM
тАО12-04-2002 07:14 AM
Re: Predictive Failure - Actions to take?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-04-2002 07:41 AM
тАО12-04-2002 07:41 AM
Re: Predictive Failure - Actions to take?
The on-line spare will wait for the drive to fail before it becomes active.
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-15-2003 02:13 PM
тАО01-15-2003 02:13 PM
Re: Predictive Failure - Actions to take?
As a matter of fact, we've done this on quite a few servers to replace 18GB7.2K drives with 36GB10K drives. After all drives had been replaced we wound up with lots of extra space on the array. We then either created extra partitions, spanned partions or resized partitions with Partition Magic (all require reboots, unfortunately)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-15-2003 09:11 PM
тАО01-15-2003 09:11 PM
Re: Predictive Failure - Actions to take?
Provided you have hot swap drives and an array controller, you can most definitely remove the drive which is showing "Predictive Failure" while the server is running. That is what you paid extra for.
Removing the drive will fail the array (shows up as yellow), and putting a new drive in will trigger a rebuild event. You should see the raid controller start to build the raid-set.
If you get the wrong drive, you're still OK because predictive failure is exactly that: Insight thinks the drive will fail because of some measurements. But for now it is still working.
If you pull a drive and then put it back (in the same slot), hot, the array will still be OK. There will be no net effect.
It is possible that the predictive failure is caused by old drive bios level. There is a customer advisory out about this.
Things to watch out for:
Make sure the drive was previously erased. Not too much of a problem when doing hot swaps, but likely to cause much angst if you shut down first.
Make sure that the drive BIOS revisions are up to date.
If the server shuts down when the array is failed, be careful when starting the system. The two prompts will be "Fail the array" and "Fail the drive and continue with interim recovery". Be sure to choose "Continue with interim recovery".
How do I know? Experience.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-15-2024 10:42 AM - edited тАО10-15-2024 10:44 AM
тАО10-15-2024 10:42 AM - edited тАО10-15-2024 10:44 AM
Re: Predictive Failure - Actions to take?
...adding my 2 cents... You are absolutely correct. You can remove a hot swap hard drive in an array that is showing "predictive failure" (check with iLo), without powering off the server. In my case, a hard drive was "failing" (predictive failure") in a Raid 5 arrary, the physical hard drive light on the hot swap hard drive cage blinked amber to green repeatedly. I pulled out the hard drive with the server power still on...OS still running.... In iLO it showed the hard drive as failed when I removed the hard drive (makes sense). Inserted a new hot swap hard drive. Then the storage status screen in iLo, change to the status "Degraded (rebuilding)". It took almost 20 hours to rebuild. Yes... we should have configured the Raid 5 array with an online spare. We did the equivalent of a human intervention online spare. Sigh...