ProLiant Servers (ML,DL,SL)
1825792 Members
2347 Online
109687 Solutions
New Discussion

BIOS/HARDWARE HEALTH CIRTICAL

 
RCB_RAVI
Regular Visitor

BIOS/HARDWARE HEALTH CIRTICAL

HI

I am seeing BIOS/HARDWARE HEALTH - failed and the server keeps on rebooting.

Model: HP ProLiant DL380 GEN 9

what could be the issue could any experts here can help ? im not server guy i just bought to setup for home lab

 

Thanks

Ravi

7 REPLIES 7
rabindra11sharm
Esteemed Contributor

Re: BIOS/HARDWARE HEALTH CIRTICAL

Dear RCB_RAVI

This is to informing you that there is many reasons for BIOS/ Hardware health criticle. Please provide us IML log to check the actual issue with your server.

Steps to collect the logs given below. Also Please go through Troubleshooting Guide it will may help you to understand actual issue.

Log in to iLO web console--> navigate to the Information→Integrated Management Log. Click the View CSV button. The IML is displayed in a format that you can copy and paste into a text editor. Copy the text displayed in the CSV Output window, and save it in a text editor as a *.csv file. share with us for checking the actual issue time stamp.

 

[Moderator edit: Updated the link.]


Thanks & Regards...
Rabindra
RCB_RAVI
Regular Visitor

Re: BIOS/HARDWARE HEALTH CIRTICAL

@rabindra11sharm Thanks for your reply. Please find below

"ID","Severity","Class","Last Update","Initial Update","Count","Description",
"2","Informational","Maintenance","[NOT SET] ","[NOT SET] ","1","Maintenance note: iLO performed an auto-RESTORE operation.",
"1","Critical","Environment","01/01/1970 00:08","01/01/1970 00:01","16","Critical Temperature Threshold Exceeded",

 

Before rebooting, i saw the below error as well

Server Critical Fault (Service Information: RuntimeFault System Board AUX/Main EFUSE

but it not reflecting anymore

RCB_RAVI
Regular Visitor

Re: BIOS/HARDWARE HEALTH CIRTICAL

"ID","Severity","Class","Last Update","Initial Update","Count","Description",
"26","Caution","Environment","01/01/2001 03:30","01/01/2001 03:30","1","System Overheating (Temperature Sensor 39, Location System, Temperature 121C)",
"25","Caution","Environment","01/01/2001 03:16","01/01/2001 03:16","1","System Overheating (Temperature Sensor 39, Location System, Temperature 122C)",
"24","Informational","POST Message","01/01/2001 03:29","01/01/2001 03:15","2","Option ROM POST Information: 1778-Slot 0 Drive Array resuming Automatic Data Recovery (Rebuild) process. Action: No action required.",
"23","Critical","Environment","01/01/2001 03:37","01/01/2001 03:01","117","Critical Temperature Threshold Exceeded",
"22","Critical","OS","01/01/2001 03:30","01/01/2001 03:01","3","Automatic Operating System Shutdown Initiated Due to Overheat Condition",
"21","Caution","Environment","01/01/2001 03:01","01/01/2001 03:01","1","System Overheating (Temperature Sensor 39, Location System, Temperature 124C)",
"20","Critical","OS","01/01/2001 02:59","01/01/2001 02:59","1","Automatic Operating System Shutdown Initiated Due to Overheat Condition",
"19","Caution","Environment","01/01/2001 02:58","01/01/2001 02:58","1","System Overheating (Temperature Sensor 39, Location System, Temperature 122C)",
"18","Informational","POST Message","01/01/2001 02:58","01/01/2001 02:58","1","Option ROM POST Information: 1778-Slot 0 Drive Array resuming Automatic Data Recovery (Rebuild) process. Action: No action required.",
"17","Critical","Environment","01/01/2001 02:56","01/01/2001 02:07","207","Critical Temperature Threshold Exceeded",

rabindra11sharm
Esteemed Contributor

Re: BIOS/HARDWARE HEALTH CIRTICAL

Dear RCB_RAVI

Thanks for providing IML logs. in the IML log it clearly mentioned that there is server cooling issue. that is why server genarates over heating logs and might be system board getting faulty as you mentioned that there is RuntimeFault System Board AUX/Main EFUSE. thou you can try to swaping PSU, or checking with single PSU. checking with minimum configuration means single PROC and single DIMM only. if  system getting power on and clear POST, let me know. other wise you have to replace system board. 

Hope, I could provide you with clear and helpful instructions.If you have any more questions or need further assistance, don't hesitate to ask. I'm here to help! Have a great day!


Thanks & Regards...
Rabindra
RCB_RAVI
Regular Visitor

Re: BIOS/HARDWARE HEALTH CIRTICAL

@rabindra11sharm 

i left the system powered on after continuous automatic reboots of the server for 5 hrs the server came up and turned of again during post at 80%. below is the log. now im not seeing the  error "RuntimeFault System Board AUX/Main EFUSE."  below is the one which im seeing

 

"ID","Severity","Class","Last Update","Initial Update","Count","Description",
"41","Caution","Environment","01/01/2001 15:24","01/01/2001 15:24","1","System Overheating (Temperature Sensor 39, Location System, Temperature 121C)"

 

where will be the "temperature sensor 39"

39-P/S 2 Zone

Thanks

Ravi.

rabindra11sharm
Esteemed Contributor

Re: BIOS/HARDWARE HEALTH CIRTICAL

Dear RCB_RAVI

Please remove all componenets clean the server components and base properly. reconnect componenets. and try to power on the server. Please use AC to controller temparature.


Thanks & Regards...
Rabindra
support_s
System Recommended

Query: BIOS/HARDWARE HEALTH CIRTICAL

Hello,

 

Let us know if you were able to resolve the issue.

 

If you have no further query, and you are satisfied with the answer then kindly mark the topic as Solved so that it is helpful for all community members.

 

Please click on "Thumbs Up/Kudo" icon to give a "Kudo".

 

Thank you for being a HPE valuable community member.


Accept or Kudo