- Community Home
- >
- Storage
- >
- Midrange and Enterprise Storage
- >
- StoreVirtual Storage
- >
- Re: P4500 Hard Drive Failure
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2012 07:03 AM
07-11-2012 07:03 AM
P4500 Hard Drive Failure
In the past 2/3 months I've replaced about 14 "bad" hard drives. Out of the 14 only about 2 of them were registered in the CMC as being faulty. The rest were a result of getting the Cluster is "overloaded" /unprotected lun email messages and creating a support ticket with HP. They then look at the logs and notice that drive X is failing (not enough to register in the CMC yet... I guess) then they send us out a replacement. Is anyone else seeing multiple failures like this? Is this normal? Should a drive that isn't bad enough to register in the CMC be bad enough to cause the whole san to be overloaded and bring down a manager?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2012 01:01 PM
07-11-2012 01:01 PM
Re: P4500 Hard Drive Failure
We have a total of 5 P4500 G2 Servers and I have had 5 separate cases opened on an issue what appears to be an issue with Hard Drive Model MB2000FAMYV (I have been receiving model MB2000FBZPN Firmware HPD1). What I have found is that with these units if the systems are rebooted they do not appear to have any issue. However once they are powered off I have had anywhere between 1 - all 12 drives fail in the system. I have had to replace almost 30 drives in the last 2 months. The first case was opened on a different issue with the system board replacement for a bad integrated NIC. HP sent a tech on site to replace the system board and when I powered the system back up I had 6 drives offline. We spent several days on the phone and had a spare raid controller, cable, cache module, battery backup, and backplane sent. However they ended up having to send us 6 replacement drives. All of the drives were running firmware HPD4 and they recommended us update to HPD5. About a month ago I started to do this and I powered off one of the other P4500 G2 Servers and went to the server to put in the HP Firmware Update Disc and when I powered it on I had 7 failed drives (I didn't even get to the point of updating the firmware). In this case I was sent 7 drives and started to question if the systems are powered off are the hard drives failing at a very rapid rate. So to test this on my next system after I repaired the other one I rebooted it instead of powering it off and updated the hard drive firmware to HPD5. The system rebooted without any issue. I then wanted to see what would happen when I power off the system. I powered it off and when I powered it back on I had all 12 drives fail. At this point I believe I was starting to be taken seriously by the techs at HP/Lefthand that a problem exists with these drives. I continued our process and have had to reload 3 of our 5 systems.
Last I spoke with an HP/Lefthand tech they stated it may be a BIOS / Hard Drive Firmware issue that has not been fixed. However I have had 3 different sets of drives sent back to their lab for evaluation. I have not got any other information. I have a post similar to this on the same forum. I believe it is titled something similar to yours.
I hope the problem gets corrected and if I find out any information I will let you know. However I would be very careful with taking down more than one system at a time because this appears to be a serious problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2012 01:03 PM
07-11-2012 01:03 PM
Re: P4500 Hard Drive Failure
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2012 01:17 AM
07-12-2012 01:17 AM
Re: P4500 Hard Drive Failure
We have 21 P4x00 nodes, ~240 drives (both 15krpm and 7.2krpm) total, in production. Failure rates are much lower, about 1 disk/month, so probably you have bad luck with buggy disk model. From our experience about half of disk failures start with 'cluster overloaded' first symptom, in rest cases SMART first reports disk as 'faulty' or RAID controller removes disk from raid group.
Basically I think problem is that 'near faulty' disk starts to slow down whole RAID controller and storage system IO operations (and, as a consequence, whole storage cluster IO operations), but error rates are not large enough for SMART or CMC to mark it as bad. HP support guys suggested for us to install HP SIM for P4x00 node monitoring (theoretically SIM should monitor disk errors, so theoretically you should notice failing disk earlier). I haven't done that yet.
Gediminas