ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

ML350 CPU Thermal Issue

Mike Lauer
Occasional Visitor

ML350 CPU Thermal Issue

A few times per month our Proliant server will automatically restart due to the CPU thermal sensor detecting overheating. It happens at random times during the day. We replaced the thermal sensor and cleaned out the chassis. The server is located in a room that is about 72-degrees. All fans are working fine and their is plenty of room around the server for air flow.

Here is the event log:

12:42:00 Server Agents Information Events 1136
"System Information Agent: Health: A Temperature Sensor Condition has been set to ok. The system's temperature has returned to the normal operating range.
Chassis: '0'; Location: '6'
(Location values: 1=other, 2=unknown, 3=system, 4=system board, 5=I/O board, 6=CPU, 7=memory, 8=storage, 9=removable media, 10=power supply, 11=ambient, 12=chassis, 13=bridge card)


12:42:00 PM cpqasm2 Information None 4110
Environment Abnormality Auto Shutdown (EAAS) cancelled.


12:42:00 PM cpqasm2 Information None 4103
The system temperature (thermal sensor #2) has cooled down to below the threshold.


12:38:59 PM Server Agents Warning Events 1135
"System Information Agent: Health: A Temperature Sensor Condition has been set to degraded.
The system may or may not shutdown depending on the state of the thermal degraded action value '3'.
Chassis: '0'; Location: '6'
(Thermal degraded action values: 1=other, 2=continue, 3=shutdown)
(Location values: 1=other, 2=unknown, 3=system, 4=system board, 5=I/O board, 6=CPU, 7=memory, 8=storage, 9=removable media, 10=power supply, 11=ambient, 12=chassis, 13=bridge card)


12:38:59 PM USER32 Information None 1074
"The process winlogon.exe has initiated the restart of computer SERVER on behalf of user NT AUTHORITY\SYSTEM for the following reason: Legacy API shutdown
Reason Code: 0x80070000
Shutdown Type: restart
Comment: HP ProLiant System Shutdown Service: System is too hot or has lost cooling."


12:38:59 PM cpqasm2 Error None 4111
Environment Abnormality Auto Shutdown (EAAS) initiated due to thermal reasons, either resulting from the system overheating, or from the loss of cooling.




Any help or recommendations about how to resolve this problem is greatly appreciated.
11 REPLIES
Ryan Goff
Valued Contributor

Re: ML350 CPU Thermal Issue

which generation ml350 is this and how many processors are installed?
Mike Lauer
Occasional Visitor

Re: ML350 CPU Thermal Issue

Generation 4 with a single processor.
Ryan Goff
Valued Contributor

Re: ML350 CPU Thermal Issue

Have you replaced the heatsink before? If not that will fix this issue.
Mike Lauer
Occasional Visitor

Re: ML350 CPU Thermal Issue

I have not replaced the heatsink. I'll give that a try.
KMullins
Frequent Advisor

Re: ML350 CPU Thermal Issue

As Ryan said replace the heatsink, call hp and explain the problem as this is a pretty common problem with the G4's heatsink.
KMullins
Frequent Advisor

Re: ML350 CPU Thermal Issue

Almost forgot check the fan on the heatsink
The fan should be on the Media bay side of the processor not the memory side.

This can cause the same problem aswell
David Paris_1
Frequent Advisor

Re: ML350 CPU Thermal Issue

Exactly ML350 G4 have problems with the overheat the ML350 G3 have that air conduct for the processor and memories, i believe that G$ also have but its optional. we have hundreds of ML350 G4 and we had that problem and i can tell the other problem that model have is NMI problems, we try everything, but the problem is only solved with the change of the system with the latest fimware.

By
Mark Leoceli Espartero
Occasional Visitor

Re: ML350 CPU Thermal Issue

Check first your processor heat sink fun if it is working fine, clean it as possible. Check if it is properly mounter over on the processor. If the still there then its time to replace a new heat sink.
Suraj Wellala
Occasional Advisor

Re: ML350 CPU Thermal Issue

I am also experiencing the same problem with My ML 350G4P Server. As I could remeber this happened after I upgraded memory from 2GB to 6GB.
I have attached the System Log file for refernce.
I checked the CPU Fan and all fans are working properly.

Thank you very much.
cnb
Honored Contributor

Re: ML350 CPU Thermal Issue

Suraj Wellala:

Best to start your own thread.

hth
sc1948
Occasional Visitor

Re: ML350 CPU Thermal Issue

Had this problem 2 years ago. Tech assistance recommended update firmware. Did that but did not help. Replaced CPU heat sink fan asembly under warranty from HP. Corrected problem for about 1.5 years. Now started receiving same temp shutdown warning. CPU sensor #2 sensor error. Saved original heat sink assembly and tried cleaning heat sink, fan, and removed heat transfer pad. Replaced with CPU Thermal Grease. Good for 4 days but now sensor reported overheating again. I have a third heat sink assembly, new with thermal pad and will now try replacing again in case the fan is intermittent.