ProLiant Servers (ML,DL,SL)
1754057 Members
2621 Online
108811 Solutions
New Discussion юеВ

Re: ML 350 g5 seems to overheat

 
SOLVED
Go to solution

Re: ML 350 g5 seems to overheat

As I wrote above I read the IML too: but the last IML warning is of August 2009!! So no error is reported for this year!!
Quite surprising!
A sign that some hacker is having fun with our server?

Re: ML 350 g5 seems to overheat

Thinking it over, I'm really puzzled.
Is it possible that IML service does not report a single caution for 2 years?
Is it possible thar IML does not report a critical "red light" situation?
Is it possible that IML itself is not working properly? Maybe disabled in some way?
Johan Guldmyr
Honored Contributor

Re: ML 350 g5 seems to overheat

Have you had any problems in the last two years though? Not all problems are reported in there either.

Re: ML 350 g5 seems to overheat

Absolutely no hw problem from purchase (2008). But there were 4 caution in the first months.

Rather I searched the ILO2 logs, everything is reported Ok, Fans are declared OK in the summary, but in detail fans 7, 8 (I/O board zone) are reported "failed" and temperature in that zone is 47├В┬░ C. See the cut&paste below. Is it normal?

Summary:
Fans: Ok; Not Redundant
Temperatures: Ok
VRMs: Ok
Power Supplies: Ok; Not Redundant

Fans:
Location Status Speed
Fan 1: System Zone Ok 35%
Fan 2: System Zone Not Installed n/a
Fan 3: System Zone Ok 35%
Fan 4: System Zone Not Installed n/a
Fan 5: CPU 1 Ok 35%
Fan 6: CPU 2 Not Installed n/a
Fan 7: I/O Board Zone Failed n/a
Fan 8: I/O Board Zone Failed n/a

Temperature:
Location Status Reading Thresholds
Temp 1: Ambient Zone Ok 27C Caution: 40C; Critical:45C
Temp 2: Memory Zone Ok 56C Caution: 110C; Critical:120C
Temp 3: CPU 1 Ok 36C Caution: 100C; Critical:100C
Temp 4: CPU 1 Ok 36C Caution: 100C; Critical:100C
Temp 5: I/O Board Zone Ok 47C Caution: 63C; Critical:68C
Temp 6: CPU 2 n/a n/a Caution: 100C; Critical:100C
Temp 7: CPU 2 n/a n/a Caution: 100C; Critical:100C
gregersenj
Honored Contributor

Re: ML 350 g5 seems to overheat

Sorry missed that. I only notised you have look in the ILo event log.

"Is it possible that IML service does not report a single caution for 2 years?"

Yes. ProLiants are rock solid.

"Is it possible thar IML does not report a critical "red light" situation?"

Yes, some errors that come early during post, might not be logged.

Is it possible that IML itself is not working properly? Maybe disabled in some way?
Not likely

I don't know if CentOS is a supportet OS.
But if it is, and if insight agents is available, then it could be helpfull to install it. Those tools also log in the IML.

It does make you problem a bit harder to solve.

Ensure proper cooling. and do check the heatzink (try to have a temp reading on the CPU)

BR
/jag

Accept or Kudo

holger holst
Occasional Advisor

Re: ML 350 g5 seems to overheat

hope so too.
some interesting developments, will keep you updated!
thanks

Re: ML 350 g5 seems to overheat

I thank you for help and for very useful tips, but the problem is there again; today there 2 server power removed/server power restored troubles recorded by ILO and this evening while I was working on the MySQL database from home, the server went suddenly down, this time without restore; the iLO system status has these lines

System Health: Unknown
Internal Health LED: Ok
Server Power: STANDBY (OFF)
UID Light: OFF

and for power

Present power reading: 0 Watts at 20:04:09, 05/27/2011

I tried to restart the server, first with the "Press and hold" button to turn it completely off, but it doesn't work, the Internal Health LED is allways Ok.
"Momentary Press" is also useless.
So I cannot restart the server remotely. The IML does not report anything, the ILO log reports only my trials. Seems quite desperate!

We have already copied all important data on a new server we had recently bought, which will take over next week, but I need to understand what is happening to this server. Last resource will be to send it to HP!
gregersenj
Honored Contributor

Re: ML 350 g5 seems to overheat

Yes, it seem like a good idea to get a techie on it.

BR
/jag

Accept or Kudo

Re: ML 350 g5 seems to overheat

Today having moved all important data to a new server I made a test between the 2 servers: the back was hotter than the new one; the power supply was really hot to touch, while the new server supply was quite fresh: the old server fans seemed more weak and hot than the new ones.
We powered down the server, opened it, had a good clean up, there was dust, but not so much; the we found that the hottest thing was the power supply; maybe the two little fans were rotten? Luckily I had an identical power supply in stock and we made the change.
Now temperatures inside are 4-5 degrees less than before. Let's wait some day ...
But the question is: is it possible that this hot power supply originated all those sudden critical "power removed" events?
holger holst
Occasional Advisor

Re: ML 350 g5 seems to overheat

update:
HP finally send engr. to check and test my server so that they believe what the HP partner service told them the whole time.... now they changed SPS-Power supply; -Drive Cage; -DC Converter and -van. server is running now since 48 hours without getting hot again ...
the power supply has definitely something to do with the problem...
keep you updated!