ProLiant Servers (ML,DL,SL)
1758157 Members
2989 Online
108868 Solutions
New Discussion

DL 380 G6 error message in iLO, what does it mean?

 
zdawg
Occasional Contributor

DL 380 G6 error message in iLO, what does it mean?

I have a DL 380 G6 that is giving me a "System Health: Failed" message

 

When I look at the drives, I see on the first drive bay the three installed drives are all fine, however on the second drive bay where there are no drives installed each bay (empty) has the following error message:  "Drive Status: Fault/Not Installed"

 

Is it telling me that the backplane on this side is bad?  There are not drives installed in those bays/backplane, so should I be worried or even care?

2 REPLIES 2
Renjiv
Respected Contributor

Re: DL 380 G6 error message in iLO, what does it mean?

Greetings!

 

Please let us know if you get any specific error code during POST.

 

System health failed error message possible cause are:

 

Possible causes:

• Improperly seated or faulty power supply
• Loose or faulty power cord
• Power source problem
• Improperly seated component or interlock problem 

 

Also you may try updating the iLO firmware. If the issue persists ,

 

1. Reset iLO by using maintenance switch #1. (later you might have to reconfigure the iLO)
2. Clear the NVRAM.
3. Update the latest version of firmware.

 

After a reboot the issue will be resolved. Please let us know the results.

 

I hope you find this information helpful.

 

Regards,

Renji V

 

 



Was this post useful? -
To say thanks click the "thumbs up" icon below!!! .....
zdawg
Occasional Contributor

Re: DL 380 G6 error message in iLO, what does it mean?

Thanks, let me give you a bit more of the story as it may help.

 

Due to some weird behavior over last weekend, most notably one of our critical business processes stopping suddenly for no reason, I decided to check the health of the machine on Monday morning. 

 

I ran a quick smartctl command just to check the current health and stats, which reported a great number of Ultra DMA CRC Error Count.

 

While investigating, all of a sudden the machine was hung - eventually I had no choice but to cold boot it.  I was able to bring it right back up and restart our business critical applications so I left it alone for the rest of the work day.

 

Later in the evening (during off hours) I was able to shut it down so I could reseat the SATA cables (based on recommendations for these types of CRC errors).

 

The next time I logged in via iLO I noticed the "System Health Failed" error in the iLO interface.  I did upgrade the iLO firmware as you menitoned, however the error is still persisting. 

 

Furthermore, I decided to dig into the IML logs and found this problem being reported:

 

Caution
POST Message
12/08/2014 18:20
12/08/2014 18:20
1
POST Error: 1719 - A controller failure event occurred prior to this power-up

 

 

This was the day that the machine was hung, I'm wondering if in fact the controller is bad?  Since its embedded, if that's the case I need to replace the motherboard ASAP, is there anything else I can do?

 

Keep in mind this is a highly critical production box, so any testing that requires powering off/rebooting would probably have to wait until the weekend, but of course I need to get this fixed ASAP so any advice/recommendations would be great.

 

And if its helpful, this machine is running Centos 6.3 64 bit.