StoreVirtual Storage
1753452 Members
5839 Online
108794 Solutions
New Discussion

Re: P4500 G2 Harddrive Failure after poweroff

 
manadrain
Occasional Advisor

P4500 G2 Harddrive Failure after poweroff

 

We have a total of 5 P4500 G2 servers and I have been working with HP on a issue we have been having with them. I would like to see if anyone else as experienced similar problems. Recently we had to have a system board replaced in one of our P4500 G2 servers to address a NIC issue. HP sent a tech on sight to replace the system board. We powered down the system in the CMC and the tech replaced the system board. When we put the system back in the rack and powered it on 7 of the harddrives appeared to have failed or went offline. After several days of troubleshooting and swapping Raid Controllers, Backplane, we ended up replacing all of the harddrives and having to rebuild the system. It was recommended that we update the firmware on the harddrives in the system. The harddrives old harddrives where model MB2000FAMYV and we had them running firmware version HPD4. We was asked by HP/Lefthand tech support to update the firmware to version HPD5. The harddrive replacements we received are model MB2000FBPN and firmware version HPD1.

 

We figured that we would update the firmware on the harddrives of our other 4 P4500 G2 servers when we had some downtime. Last week I started this process and powered off another one of the P4500 G2 servers (powered off using the CMC). I then went to were the server was located and put the firmware DVD in the powered it back on to start the firmware update process. During the POST process we had 6 drives offline. I got in contact with HP/Lefthand about this issue and explained that we had a similar issue with another one of our P4500 G2 Server being powered off and a large number of harddrives failing. I got 6 spare harddrives and was able to get this P4500 back online and the data replicated.

 

Now I stated to think that the P4500 G2 servers are having some problem when they are powered off. We have had these for about a year and have never took them offline. They may have rebooted but never powered off. We then tested a third P4500 G2 and this time instead of powering it off in the CMC I rebooted it and was near the storage unit to put the firmware dvd in. The firmware updated and at this point I had no failed drives. The server rebooted and I was able to discover it in the CMC. I waited for the volumes to sync the data and then we wanted to test powering the unit and see if any drives fail. It was recommended by the HP/Lefthand tech support to me to update the firmware on the harddrives to HPD5 because this addressed a timing issue that may be causing the failure. I powered off this system and waited a few minutes and powered it back on. This time I have all 12 drives offline. I had to contact HP/Lefthand tech support and we ran some diagnostics from the smartstart CD on the RAID Controller. They believe the problem may be with the RAID controller however I do not believe it will fix the problem. They wanted to send an HP tech onsite to replace the raid controller. I did end up getting the same tech who was here before and witnessed this issue. After speaking with the tech he does not believe the RAID Controller is the problem either and is getting 12 spare drives as well. He believes this problem needs to be escalated higher to find out what is causing the drives to fail.

 

Has anyone else experienced a similar problem like we are having on our P4500 G2 Storage servers

13 REPLIES 13
manadrain
Occasional Advisor

Re: P4500 G2 Harddrive Failure after poweroff

Some other information I forgot to mention:

 

All of the P4500 G2 NSMs are running San i/Q 9.5 with all current patches as of June 2012.

Paul Hutchings
Super Advisor

Re: P4500 G2 Harddrive Failure after poweroff

Yeah we had a similar thing, which was nice.

 

You need to make a note of the models of HDD in each P4500 and get the latest firmware and update the drive firmware, and the RAID controller firmware from a SmartStart CD - then it should all be good.

manadrain
Occasional Advisor

Re: P4500 G2 Harddrive Failure after poweroff

The current P4500 G2 that had all 12 drives fail after it was powered off was the same model and running version HPD4 firmware. The Raid controller was also running the recommended firmware version. I restarted this server to the HP Firmware DVD 10.1 and updated the firmware of the hard drive to version HPD5. The server updated without issue and came online. After the volumes synced I wanted to see if we would still have the issue so I powered down the unit. All 12 drives failed and running the current version.

 

Today the HP tech arrived on site that had this same thing happen on one of other P4500 G2 servers replacing our system board.  We put 12 new drives in the server and the RAID controller could see them without any issue. We ran system restore and the system is back up, just need to put it back in the management group and cluster then sync data. All of the new drives are the different model HP has been sending us for replacements.

 

I am thinking about powering off the server before putting it back in the management group just to test if everything will come back online with the new hard drives after it is powered off. The tech is getting this issue escalated to higher level tech support and needs to get the manufacturer of the harddrives involved. Looking at the serial numbers on them it appears that HP is using Seagate drives.

 

So I have had around 25 2TB hard drive replacements in 3 weeks and I have about 35 more of the drive model that keeps failing in the other systems. 2 of the P4500 G2 Servers I have not tried to update the firmware or power off yet. This appears to me to be a defect in the hard drive with this model. I have been on many calls with several different HP/Lefthand tech support people the last few weeks. I felt that they did not believe me when I kept telling them about the problem I noticed after powering off these model NSMs. I have 8 of the older 4150 NSMs prior to Lefthand being acquired by HP. These run on Dell PE1950 Servers and an MD1000 attached to it. These have been running for around 4 years and have never had any issues like this (however the hard drives in these are Seagate as well). Today when I spoke with another Lefthand/HP tech on the phone they asked me what happened with the server. My response was I powered it off. Again I don't believe they think a serious problem exists with these hard drives. I am glad the onsite tech from HP confirmed with the HP/Lefthand tech that this problem is for real and it is nothing we are doing. You power off the server and it just seems to start dropping hard drives.

oikjn
Honored Contributor

Re: P4500 G2 Harddrive Failure after poweroff

please keep us posted. 

 

I hope everything works out ok....  keep those fingers crossed that you don't experiance a major power outage as the loss of a node or maybe two (depending on your setup) might not cause produciton dissruption, but turning them all off at once sounds like a nightmare.

 

On a side note, this gives me a sense of dejavu... not HP related and I can't remember what, but there was some firmware on some device that after xxxx number of power-on hours would effectively destroy itself on the next power cycle and would require a return to the manufacturer to physically replace the ROM chip if you didn't update the firmware before the xxxx number of hours was reached.

manadrain
Occasional Advisor

Re: P4500 G2 Harddrive Failure after poweroff

I ran a test on another one of our P4500 G2 servers this morning.  Great news...only one failed drive!!  Same as before, all of the drives in this unit are model MB2000FAMYV.  The firmware that was on these drives is HPD4.  I had upgraded the firmware to HPD5.  Waiting on a call back from HP/Lefthand support to get another 2TB drive.

 

We ended up setting up 3 way replication for some of our critical volumes on these units at least until we fill confident that the hard drives in the servers will survive if the power would drop.  This box I had powered off, powered it on, and then sync the data at least two times to make sure that no additional drives would fail. 

oikjn
Honored Contributor

Re: P4500 G2 Harddrive Failure after poweroff

not that any failures are ever a good thing, but I gotta say that I love the VSA network raid that allows such a major failure in an array not stop production (minus the performance loss of course).

manadrain
Occasional Advisor

Re: P4500 G2 Harddrive Failure after poweroff

 

I ran a similar test this morning on our last P4500 G2 and this time got 3 failed drives (Drives 3, 6, and 9). However this time I ran diagnostics and exported all the Smart Logs and Array Controller logs prior to rebooting to upgrade the firmware (Everything Passed). Then I rebooted the system from the CMC and upgraded the hard drive firmware on all 12 drives from HPD4 to HPD5 and the system came up without any issue. I then ran the diagnostics again and exported the logs (Everything Passed) so I could have a copy after the hard drive firmware update and prior to powering the system off. I then powered the system off from the CMC. After I powered it back on drives 3, 6, and 9 have failed. These are the same model drives MB2000FAMYV 2TB Drives.

 

I now have a fifth case open with HP on this same issue. I am still waiting from HP to get a response on these cases as to when it has been escalated to a higher level and if they have got a response from the hard drive manufacturer. They now have a total 30 2TB hard drives after these three are replaced with the same issue.

 

On another note I tested our 4 P4300 G2 Servers last week and did not have any issues. These are for the most part all running drive model HP MB1000BAWJP 1TB Drives.

Manfri
Frequent Advisor

Re: P4500 G2 Harddrive Failure after poweroff

One of my customer reported a similar horror story mainly with 450GB drives, the good thing is that a general firmware update using HP Service Pack for ProLiant (SPP) DVD.

 

The problem has been fixed without hardware replacement.

If i remember correctly , they leaved the firmware of the controller to the latest Lefthand approved.

 

The problem was obviously controller o drive or both related, because it surfaced before the saniq started.

 

The customer tested the shutdown/startup procedure at least 10 times before restart gaining faith in the solution 8-)))

 

I will followup with controller and hard disk model and firmware release

David_Tocker
Regular Advisor

Re: P4500 G2 Harddrive Failure after poweroff

Reading this later - but i feel for you mate. What a cluster-f**k.

Did you ever find a solution/cause?
Regards.

David Tocker