ProLiant Servers (ML,DL,SL)

Problems with Proliant DL560 Gen 8

 
secured2k
New Member

Re: Problems with Proliant DL560 Gen 8

DL380P G8

 

Same problem with the iML reporting System Power Fault Detected. System Has Enabled Power Protection and Disabled Power Supplies.

 

I have reset the NVRAM (Switch 6), pulled the battery, swapped the Power Supply Backplane, and disconnected most any other devices (all add card). The problem still occurs.

 

I attempted to use a new PSU in slot 2 and the system wouldn't power back on, but when I tried it in slot 1 (with nothing in slot 2), the system powered up. I ended up trying both PSUs in slot 1 with nothing in slot 2 and the system seemed OK.

 

BIOS/iLO/CPLD were all updated to the latest versions and a quick insight diagnostics and test run were made. All tests passed.

 

However, the problem came back about 8 hours later.

We are now trying to replace the system board.

 

If this doesn't work, I think it might be a power supply issue as well and will probably replace both with new ones at the same time.

kktse0002
Occasional Visitor

Re: Problems with Proliant DL560 Gen 8

I have a new DL380p Gen8 server with the same problem. The server shutdown almost every hour with a IML log "System Power Fault Detected. System Has Enabled Power Protection and Disabled Power Supplies".

 

I have replaced power supply, on-line UPS and CPU.

 

Does anyone have a solution?

Nicolai Rasmussen
Regular Advisor

Re: Problems with Proliant DL560 Gen 8

I know this is an old thread, but we're still seeing issues that are quite similar to the ones described so far.

 

We've had similar issues on BL460c Gen8 and now also on DL380p Gen8.

The BL460c Gen8 actually said itself in the IML:

 

System Power Fault Detected. Replace System Board (XR: 10 20 MID: FF CD FC D7 03 13 13 AA 00 02 00 EE 02 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)

 

So the board was replaced. This was 1 month ago and so far so good.

I've also seen a BL460c Gen8 server that just powered down without ANY information in the IML what so ever.

Via ILO i could see the lovely error "BIOS/Hardware Health - Failed", but no indications as to what was wrong. The blade refused to power up, so I tripped the E-fuse and the blade came back online, now without any indications of an error.

This was a few months ago, and I've not seen this problem since.

 

Until last week, where one of our most critical DL380p Gen8 servers power down with the same error: "BIOS/Hardware Health - Failed"

Power cycle cleared the error, server came back online, no errors. AHS logs sent to HP - they claim they can't see ANYTHING indicating an error...

 

REALLY REALLY frustrating HP. - How can we trust the Gen8 platform, when weird sh*t like this keeps happening?

 

And everytime you contact support, they treat you like you just graduated from high school, and act like they've never heard of this error before...

 

stefanost67
New Member

Re: Problems with Proliant DL560 Gen 8

Hi all,

we have the same problem with 2 of 2 Proliant DL560 Gen 8 servers we just bought.

Let us see what support will tell us next week!

 

 

 

jmhanna
Visitor

Re: Problems with Proliant DL560 Gen 8

Recently HP made a change to the documentation for the latest ILO firmware update (1.51) and it sounds like there is possibly a fix in there (NOT just pertaining to a hypervisor OS, as originally reported) it now states :

 

"1.51 Fixes:
- Intermittent Non-Maskable Interrupt (NMI) Events May Occur on HP ProLiant Gen8 Servers running HP Integrated Lights-Out 4 Firmware Versions 1.30, 1.32, 1.40 and 1.50."

 

Intermittent NMI events?  Possibly.  The management log did report a power problem for every event, however the ILO log simply reported "server reset".  

 

I am hopeful this new firmware is the fix.  My only other 560 G8 server is running ILO 1.20, it never did the automatic reboot thing, I will leave it alone since it's not an affected version.  This problematic box was first running 1.32 then I updated it to 1.40 when instructed by support back in January.  

 

Fingers crossed (still)

 

miaso120
Advisor

Re: Problems with Proliant DL560 Gen 8

We are facing the same exact problem with bl460 g8. Servers are about two years old and till this week we had not a single issue with them. Then yesterday I discover that one of the bl460 is powered off with red health light glowing and it refuses to turn on. There were no errors in the IML and no alert-mail was sent. After we tried to reset it by taking out of the enclosure and putting it back nothing changed, but at least a new entry was added to IML: System Power Fault Detected. System Has Enabled Power Protection and Disabled Power Supplies. This server had BIOS 09/18/2013, Power mgmnt controller 3.2 and iLO 1.50

 


And one day later our second bl460 went off with the exact same sympthoms - powered off and red health LED. At least this time the error was in the IML right away and the alertmail was sent. This server had BIOS 12/20/2013, Power mgmnt controller 3.3 and iLO 1.51

 

Our enclosure is hooked up via HP R5000 UPS and nothing suspitious was logged during last couple of days.

 

HP has already sent us a replacement motherboard for the first server and is now deciding what to do with the second one. This whole situation makes me really nervous.

 

failed blades.png

 

Nicolai Rasmussen
Regular Advisor

Re: Problems with Proliant DL560 Gen 8

Just keeping this thread alive - we had another BL460c Gen8 fail with this error today. The funny thing is, the IML message is different, depending on where you view it from. View it from the OS and this is what we see:

 

System Power Fault Detected. Replace System Board (XR: 10 20 MID: FF CD FC D7 03 13 13 AA 00 02 00 EE 02 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)

 

View it from the ILO, and this is what we see:

 

System Power Fault Detected. System Has Enabled Power Protection and Disabled Power Supplies (XR: 10 20 MID: FF CD FC D7 03 13 13 AA 00 02 00 EE 02 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)

 

So...which is it, HP?

 

My never ending concern with HP and their willingness to "just replace the motherboard" is that we NEVER learn if this is a generic error/issue. We've now seen the issue on 4 blades and 2 rack servers. It's fairly annoying when a server just decides to power off, so if this error is present in lets just say 20% of our servers, I would F***ING like to know about it, HP! All I can do now, is report yet another one to HP, get a new motherboard and then sit and wait until the NEXT one dies...

 

That's just not good enough. Own up to whatever is causing this, and allow me to identify how many of my servers are affected by this issue, so we can get ALL the boards replaced (if that is what it takes). Comments from HP please???

 

Bios: 2014.02.10

ILO: 1.51

Power Management Controller: 3.3

 

Sorry for the foul language, but man this is frustrating!

 

miaso120
Advisor

Re: Problems with Proliant DL560 Gen 8

Quick update on my issue earlier in this thread - we ended up replacing MB in both blades and also 2 CPU-s (one in each blade). For our company its 33% failure rate so far - 2 Gen8 servers failed out of 6.

mouthpiec
New Member

Re: Problems with Proliant DL560 Gen 8

had same issue ... problem solved by replacing mainboard, to discover that 2 RAM modules were also faulty.

briancola
Visitor

Re: Problems with Proliant DL560 Gen 8

I have an HP DL380p G8 just happen at 11.30pm on FRIDAY night.

This is usually when nothing is happening and the server is sitting idle not much happening.

 

This being my main server, I rely so much on it with multiple VM's on it. 

 

My error is the following:

 

12," Critical","Power","05/13/2014 19:40","05/13/2014 19:40","2","System Power Fault Detected. System Has Enabled Power Protection and Disabled Power Supplies (XR: 04 00 MID: FF 0D FC CE C0 FF FF 32 32 0C 0C 00 06 00 00 01 03 47 00 00 00 00 00 00 00 00 00 00 00 00 00 00)",

 

Currently running:

System Rom: P70 12/20/2013

iLO Firmware Vers. 1.40 Jan 14 2014

I checked the status health of my hardware in iLO after a remote reboot to get the server back online and everything seems Green again. 

 

I plan on doing the SPP update on the server on Monday since it's now Saturday early morning. 

http://h17007.www1.hp.com/us/en/enterprise/servers/products/service_pack/spp/index.aspx#tab=TAB3

 

 

I'm doing vmk backups at the moment and hoping it will stay online for the time being.

 

Wish me luck. If anyone has anything else to chime in, i'm all ears!!!!