ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Hotswap on predictive failure = ok?

Alan Sosebee
Occasional Visitor

Hotswap on predictive failure = ok?

We have had some debate around the office if replacing a hard drive that has predictive failure while still online is safe. A co-worker has told me that an HP tech support specialists instructed him not to. HP said that the drive was still currently being written to, and that it could result in data loss. Most of servers are setup at RAID 1+0. I can see the point, but if the drives are mirrored, wouldn't the same data write to the other drive and not matter if one of the drives was replaced.

Thanks in advance.
2 REPLIES
Roy Main
Valued Contributor

Re: Hotswap on predictive failure = ok?

Yes, it's ok. You need to make sure there are no other failed drives in the array.

You want to replace any failed drives 1st. Then replace prefailed drives.

You never want to cause or force a situation where 2 drives in the array are failed at once.

Make sure the array has completed any rebuilding of other failed drives also. Then, go ahead and replace.
Oleg Koroz
Honored Contributor

Re: Hotswap on predictive failure = ok?

Pull drive out online with Predictive failure and Consequences

It's ok if you positively that no other problems in the array
It's ok if you do it quickly enough, add replacement only if recovery mode enabled
It's ok if server is not under intense write operation
It's safer for servers that use BBWC
It's ok if you have most resent SCSI firmware and drivers
It's ok if you have good valid backup

As seen on the practice, but not finally, so it's only thoughts.
Problems that might happen, if:
- Server under performance degradation and intense write operation; once you pull drive out you trigger services to swap to another process, small data portion can be misplaced and lead to corruption.
- Nature of failure has ability to pass on the second HDD and cause corruption.
- Replacement done so quick, and agents had not detected recovery mode and replacement hard drive detected as new, along one that you pulled reported as failed later.
- Drive that you used is not clean and cause RIS corruptions or rebuild problem with followed corruption.
- Other

If you have any doubt that it won't work for you, bring system down, pull drive out, power up server, accept Recovery mode and plug replacement drive after.
All seen incidents extremely rare, unique and equal to one small fish in the big ocean.

See attachment for LED status and meaning if that helps.