ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Dl380G7 won't power back on after power fail

 
dbdata
Occasional Advisor

Dl380G7 won't power back on after power fail

I've seen this problem before but no certain resolution.

I have a rack of 5 DL380G7 servers (dual X5680 dual 1k PSU).    All 5 servers have one PSU plugged into wall power and 1 PSU plugged into a 5kKVA UPS.  Their reasoning is two-fold.  On UPS failure, the wall power will endure, on wall failure the UPS will supply power to the servers after the network and clients fail ... far long enough to let drives write their caches and the O/S (Centos) to sync the drives.

Last week we had a systemic power failure in the night that lasted two hours, so all 5 servers shut down.  The power came on at 4am and 4 of the 5 rebooted.  The 5th had an AMBER power light with the entire health panel and all other indicators dark.    Pushing the power button did nothing, pushing and holding for 30 seconds did nothing nor did pushing the button via ILO3.  ILO3 showed nothing peculiar except that -at the time of power fail- the power supplies were not redundant.

1) Pulled the power cords, waited 3 minutes & re-plugged and no change.

2) Swapped one PSU each another running server and no change.

3) So we slid the server out of on the rails and opened the top ... and while we were removing the metal cover to the rise in order to access SW6 ... the server came to life.

 

Yesterday we had another failure (the power company scheduled a down time and forgot to tell us) and the exact same thing happened.    This time, being a weekend, I had the office staff do steps 1 & 2 and while I was explaining step 3 (they hadn't even slid the unit out) the server powered up.

Yes, all the BIOS are up to date (and identical).

I've always noticed that DL power supplies are on and quite warm even when the servers are powered down, so I thought perhaps that the PSUs overheat when in that state ... but Step 2 disproved that notion ... and that's after the fact that when building power was out, all 10 PSU's went stone cold so all 10 experenced all the same conditions.  Combined with the other similar complaints, this simply REEKS of a 'known issue' with the products, doesn't it?  

I've got no problem calling HP Field Support in to fix this, but all my experience tells me that I can't be the first person with this condition .. and I'm not a fan of paying anyone to try things until they finally get it right.

Anyone have any musings or suggestions that don't focus on my bad grammar or shallow reasoning skills?

1 REPLY
dbdata
Occasional Advisor

Re: Dl380G7 won't power back on after power fail

More information

I had considered a known HP issue where the ILO ram is corrupted when it tries to write to the NVRAM to log the power fail just as the power fades ... that supposedly is corrected by removing the power supplies and the CMOS battery to completely clear the world ... then reinstalling and powering up with Switch 6 on .... etc.   Then making sure that the ILO is beyond whatever version ..... blah blah blah

 

Needless to say we did all that, several times.   Finally removed the server from the computer room to the lab, where it auto powered up after 30 seconds.   So, wow. back the the server room ... where it refused to power up.,   Back to the lab, take it apart, reset everything (again) and still no power up for over an hour

 We took the server apart and tried to put it back together part by part (first w/no memory, then on pair, etc) then the 410i memory, then the second power supply then adding the drives, etc. with no consistent changes.  Some times it will power up but usually not.

There are a series of LEDS on the motherboard  near the onboard P410's memory module and the onle thing consistent is that LEDs 0,1,2&3 are stuck ON for the entire time.    Thing is, I can't find ANY documentation on these LEDs.

 

Anyone?