ProLiant Servers (ML,DL,SL)
1748157 Members
3999 Online
108758 Solutions
New Discussion юеВ

Re: DL380G6 overheating

 
JoeKeller
Occasional Contributor

DL380G6 overheating


I noticed the following message when I was logged into my ProLiantDL380G6 server.

Broadcast message from root (Wed Jul 29 12:14:37 2009):
A System Reboot has been requested by the management processor in 60 seconds.
Broadcast message from root (Wed Jul 29 12:14:47 2009):
The system shutdown has been cancelled by the management processor.

In examining /var/log/messages around this time period I see:

Jul 29 12:14:37 daniels hpasmlited[6445]: CRITICAL: System Overheating (Zone 7, Location Memory, Temperature 170C)
Jul 29 12:14:37 daniels hpasmlited[6445]: A System Reboot has been requested by the management processor in 60 seconds.
Jul 29 12:14:37 daniels wall[2317]: wall: user root broadcasted 1 lines (79 chars)
Jul 29 12:14:47 daniels hpasmlited[6445]: NOTICE: System Overheating (Zone 7, Location Memory, Temperature 41C) has been repaired
Jul 29 12:14:47 daniels hpasmlited[6445]: The system shutdown has been cancelled by the management processor.
Jul 29 12:14:47 daniels wall[2341]: wall: user root broadcasted 1 lines (69 chars)

Some two minutes later I see the following message in the log which I don't understand:

Jul 29 12:16:07 daniels kernel: IPMI message handler: BMC returned incorrect response, expected netfn 5 cmd 27, got netfn 5 cmd 35

Searching further back in the log I see other messages that don't make sense (no one removed any hardware):

Jul 29 12:05:36 daniels hpasmlited[6445]: Sensor 6 invalid status state: 0x0
Jul 29 12:05:36 daniels hpasmlited[6445]: Sensor 8 invalid status state: 0x0
Jul 29 12:05:36 daniels hpasmlited[6445]: Sensor 10 invalid status state: 0x0
Jul 29 12:05:36 daniels hpasmlited[6445]: CRITICAL: System Fan Removed (Fan 1, Location System)
Jul 29 12:05:36 daniels hpasmlited[6445]: CRITICAL: System Fan Removed (Fan 3, Location System)
Jul 29 12:05:36 daniels hpasmlited[6445]: CRITICAL: System Fan Removed (Fan 5, Location System)
Jul 29 12:05:47 daniels hpasmlited[6445]: NOTICE: System Fan Inserted (Fan 1, Location System)
Jul 29 12:05:47 daniels hpasmlited[6445]: NOTICE: System Fan Inserted (Fan 3, Location System)
Jul 29 12:05:47 daniels hpasmlited[6445]: NOTICE: System Fan Inserted (Fan 5, Location System)

I have a number of other servers in the same cabinet not reporting these temperature issues. What is going on?
4 REPLIES 4
ess
Super Advisor

Re: DL380G6 overheating

hi
how many cpu install in to the server?
how many fan install?
JoeKeller
Occasional Contributor

Re: DL380G6 overheating

There are 4 quad-core processors installed. There are 4 system fans and 2 cpu fans.
PDP-Fan
Valued Contributor

Re: DL380G6 overheating

I have 2 DL380G6 servers and I noticed that one of them spins the fans faster when CPU load is applied (test with BOINC) and the other remains silent... and gets hotter.

I try to figure out what the difference could be. The two servers were bought togehther, they have the same BIOS version and I'm running an exact clone-copy of WIN2003 on them

About your thermal problem... A temperature of 170C is not possible without physical damage to the parts. That's the temperature when chips start smoking...
I guess it's either a software bug or a temp sensor failed.

Since it is a brandnew server, I would suggest to disassemble/reassemble the server to make sure that all connectors make good contact. Sometimes it is a problem with new machines that connectors are dirty from the manufacturing process.
***********************************************
"If it seems illogical... you just don't have enough information"
JoeKeller
Occasional Contributor

Re: DL380G6 overheating

I actually have several servers all with the same issue. I have updated to the latest hp-health and HP's version of OpenIPMI driver and that fixed most of the messages with the exception of this:

Oct 19 10:45:09 daniels hpasmxld[5527]: Sensor 7 invalid status state: 0x0
Oct 19 10:45:09 daniels hpasmxld[5527]: iLO 2 Communications Error - Attempting synchronization!
Oct 19 10:45:14 daniels hpasmxld[5527]: hpIoctl: Waiting on IPMI initialization
Oct 19 10:45:54 daniels hpasmxld[5527]: iLO 2 has responded to reset request . . .
Oct 19 10:45:54 daniels hpasmxld[5527]: Resetting Internal Data structures . . .
Oct 19 10:45:54 daniels hpasmxld[5527]: Initializing Internal Data structures from iLO 2. . .
Oct 19 10:45:56 daniels hpasmxld[5527]: The iLO 2 reset / synchronization has completed successfully