Server Management - Systems Insight Manager
1752625 Members
4640 Online
108788 Solutions
New Discussion юеВ

Re: CPU temperature

 
David Claypool
Honored Contributor

Re: CPU temperature

Phil's comment is very appropriate. The internal temperature at the various sensor locations CAN NOT be used to infer ambient temperature values. It is not possible to derive one from the other.

Further, the actual values of the sensor readings are MEANINGLESS except in comparison to the threshold values.
Cristian Zanni
New Member

Re: CPU temperature

I'm using MRTG, with RRDTool.
I found some "alternative" way to check the temperature.
First, you must download a freeware called "MBM" which monitors your motherboard vars (Fans, Temperature, Voltage, etc).
And then, you may install a free SNMP agent, who acts like an extension of MBM.
It works very nice for me: i'm polling and graphing temperature, fan rpm, and voltage in my HP servers.

Here you can download MBM
http://mbm.livewiredev.com/

Here you can download SNMP agent http://www.wtcs.org/informant/mbm/overview.htm

regards,
William_114
Advisor

Re: CPU temperature

Dear King David,

Come down off your throne for a minute and listen. The original question was to be able to monitor the CPU temperature, not the ambient temperature in the room. I know that there is no way one can find the rooms ambient temperature based on the readings of the thermal probes in the server.

However, the readings for the OID values that I have specified are not "meaningless." HP's own MIB files lists the values that I am pulling as and I quote:

"This is the current temperature sensor reading in degrees
celsius.
If this value cannot be determined by software, then a value
of -1 will be returned."

The value is not some meaningless number that can only be used for comparison. It is the actual temperature is degrees Celsius, the same degrees Celsius you use to measure the temperature outside. So now that we have the temperature of the CPU, what should we compare it too? Maybe Intel├в s CPU die temperature charts. All you have to do is look at the maximum die temperature and make an inference from it. (e.g. Max die temp on a 3.00 GHz Xeon is 95├В┬░C, so my CPU running at 38├В┬░C, everything seems fine.)
Phil Slator
New Member

Re: CPU temperature

Another tool to look at would be OPManager.

http://manageengine.adventnet.com/products/opmanager/

This can monitor all sorts of hardware. It can be configured to log the CPU temperatures from HP's MIB.


The basic product's about $900, but there's a free version that you can use on a limited number of servers.

(I've also been looking at ways to monitor a dodgy air conditioning unit).
David Claypool
Honored Contributor

Re: CPU temperature

William:

Sorry, it doesn't work that way. The Intel chart on max temperature is of course used by the development engineers, but the actual sensor location needs to be taken into account. That's why each individual server has individual thresholds for each sensor.

In other words, if max temp in the Intel chart says 48C, you might actually see that the threshold on the sensor in the ProLiant for CPU1 is set to 38C and for CPU2 is 42C. The position of the sensor is picking up radiated thermal energy and other factors such as airflow past that sensor will affect what its reading is.

If the sensor ever were to detect 48C, you would probably never know about it, because by that time the die has gotten to 70C and the system has been smoked.

These are not actual numbers, but used for illustration.

A whitebox without the ProLiant Advanced System Management Controller may benefit from wasting network bandwidth to poll temperature, but ProLiant servers are self-managing.
William_114
Advisor

Re: CPU temperature

These values are the exact same values that the HP agents are already polling. You can find these values under the environment section of the system management home page. You can even see the thresholds there. Doesn├в t matter if it is only that ambient right near the CPU it is the temperature HP has chosen to represent the CPU temperature. While each individual server has different thresholds, they are the same between like models. I agree there is no absolute threshold, or exact reading.

Threshold or values, it makes no difference because as I am trying to point out, they are simply looking for a way to track these values.
David Claypool
Honored Contributor

Re: CPU temperature

You are making my point. However, the agents only have these values because they are read from the Advanced Systems Management Controller. HP SIM doesn't have them. And you are wrong about same models--this varies with processor, type of drive cage, PCI-X vs PCI-e.

My point is that if you go out and collect values and display them, they are only relevant to the threshold.

It still comes down to this: why bother gathering and graphing meaningless data? ProLiants are self-managing.

You should probably read this document: http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=PSD_CN0416W
William_114
Advisor

Re: CPU temperature

See Attachment
David Claypool
Honored Contributor

Re: CPU temperature

Where's the threshold on your graph? What happens when it spikes up and gets everyone in a tizzy but it's still below the threshold?

You also have a choice with ASR--it does not have to perform a system shutdown. It can simply do a notification.

You say you don't want to be paged in the middle of the night. In your scenario someone has to be told to turn the A/C up over night--how are they informed?

Okay, enough of this flame war.
Bart_38
Occasional Advisor

Re: CPU temperature

Why is it "useless information".

I am now hunting for such a program to measure the temp of my server processorors to test is a certain rack mount cooling system does its job.

So want to measure if "today", switch the thing on and measure "tomorrow" and see the difference...