ProLiant Servers (ML,DL,SL)
1752307 Members
5239 Online
108786 Solutions
New Discussion

DL380e with Critical Temperature Threshold Exceeded

 
gck303
Occasional Contributor

DL380e with Critical Temperature Threshold Exceeded

My DL380e server has just started rebooting and giving the following errors every few hours. See below. The room is not overly hot (27C) and the server is not under much load. 

A few days ago some changes which did not cause the problems, but may be related.  It was working fine for a week with 6 drives in the array, and the problem occured when a seventh disk inwas added into the array and the array expanded. 

- replaced the last disk in the array with a HGST 900GB 2.5 SAS drive and expanded the 7 disk array to most capacity on the disks. All the disks are the same HGST SAS drives (NETAPP X423_HCOBE900A10)

- moved the boot partition to a different logical drive and re-installed the OS (Ubuntu) 

- fitted a small 12v fan to the p420 to reduce the noise. This is being powered from the molex cable intended for the rear LFF cage

What could be the cause of the critical temperature error and reboot?

When I use the hpasmcli utility I see something strange is the problematic sensors. It is problematic as there is NO WAY any part of the server is at 18C, and sometimes sensor #45 reads 1C!

 

 

#41       SYSTEM_BD            34C/93F    90C/194F
#42       SYSTEM_BD            34C/93F    90C/194F
#43       SYSTEM_BD            35C/95F    90C/194F
#44       SCSI_BACKPLANE_ZONE   -          -
#45       SCSI_BACKPLANE_ZONE   -          -
#46       SCSI_BACKPLANE_ZONE  18C/64F    65C/149F
#47       SCSI_BACKPLANE_ZONE  18C/64F    65C/149F
#48       CHASSIS_ZONE         33C/91F    90C/194F
#49       CHASSIS_ZONE         32C/89F    90C/194F
#50       SYSTEM_BD            27C/80F    60C/140F
	285	
POST Message	06/03/2020 05:15	06/03/2020 05:15	1	POST Error: 1785-Slot X Drive Array Not Configured
	284	
Environment	06/03/2020 05:12	06/03/2020 05:12	1	Critical Temperature Threshold Exceeded (Temperature Sensor 47, Location Storage, Temperature 93C)
	283	
POST Message	06/03/2020 02:24	06/03/2020 02:24	1	POST Error: 1785-Slot X Drive Array Not Configured
	282	
Environment	06/03/2020 02:21	06/03/2020 02:21	1	Critical Temperature Threshold Exceeded (Temperature Sensor 47, Location Storage, Temperature 97C)
	281	
POST Message	06/03/2020 01:31	06/03/2020 01:31	1	POST Error: 1785-Slot X Drive Array Not Configured
	280	
Environment	06/03/2020 01:28	06/03/2020 01:28	1	Critical Temperature Threshold Exceeded (Temperature Sensor 46, Location Storage, Temperature 97C)
	279	
POST Message	06/03/2020 00:10	06/03/2020 00:10	1	POST Error: 1792-Slot X Drive Array - Valid Data Found in Cache Module. Data will automatically be written to drive array.
	278	
POST Message	06/03/2020 00:10	06/03/2020 00:10	1	POST Error: 1785-Slot X Drive Array Not Configured
	277	
Environment	06/03/2020 00:06	06/03/2020 00:06	1	Critical Temperature Threshold Exceeded (Temperature Sensor 47, Location Storage, Temperature 72C)

 

 

 

Thanks, George

4 REPLIES 4
Sham82
HPE Pro

Re: DL380e with Critical Temperature Threshold Exceeded

Hi,

The Issue pointing to - Temperature Sensor 46, 47 - Location Storage , which could be the Storage Sub system( HDD's , Controller)

We could Suggest you to update the Controller Firmware , however now that you are using Non-HPE HDD's and their Firmware is not been tested with the particular server.

So non HPE drive is installed which has a non HPE firmware and if there is a conflict that the controller is unable to communicate with the drive firmware then there is a possibility of occurance of such issues.

and the Server ( Ilo ) is unable to read the temp of the non hpe drive.

Suggestions :
1. You can still try and update the below Firmware and check.
> BIOS
> Ilo
> Controller Firmware.

2. Please use HPE recommended HDD's:
Page 34 onwards : https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=c04128166

Thank you
HPE Employee


I work for HPE.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo

Sham82
HPE Pro

Re: DL380e with Critical Temperature Threshold Exceeded

Hi @gck303 ,

I hope my solution has helped you to resolve the query, if not kindly let me know If you have any further issues.

HPE Employee


I work for HPE.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo

gck303
Occasional Contributor

Re: DL380e with Critical Temperature Threshold Exceeded

Thank you for the reply It did not help. There is nothing wrong with the net_app disks, they were not the trigger for the overheating errors. 

It was the small 12v fan. When this was removed from the power supply the fault vanished. 

Very strange. But... the fan is now being powered by an external 12v supply. 

Sham82
HPE Pro

Re: DL380e with Critical Temperature Threshold Exceeded

Hi,

we are glad to hear that you were able to figure out and resolve the issue .

please let us know if anything else we can help you with.

HPE Employee


I work for HPE.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo