cancel
Showing results for 
Search instead for 
Did you mean: 

CPU temperature

Bernie_18
Occasional Visitor

CPU temperature

How can I get SIM to collect the CPU temperature on all my servers? Or is there a command line that will get that info?
21 REPLIES
David Claypool
Honored Contributor

Re: CPU temperature

HP SIM collects that data every 5 minutes from ProLiant servers and reflects that in your health status. The agents work together with the Advanced Systems Management Controller to monitor environmentals and to compare them with the factory-set thresholds. If you have traps set up properly to send alerts to HP SIM, any over temp condition will be reported by the agents.
Brent Seizer
Advisor

Re: CPU temperature

Yes. I asked a very similar question about 6 weeks ago. I wanted to know how to log the current system temperature every hour. I was told it couldnt be done, and that the system would mearely report an over temp condition.
What is the over temp set at? 80F? 85F?
Nobody knows.
I called Rittal that makes the HP server racks, and still makes the enviromental monitors that HP no longer offers.
They gave me the same answer. They said it doesnt work like that.
I wanted the info so that I couold prove to management that they need to spend more money to upgrade the HVAC.
Without chartable data to prove my point the servers will either burn up or just crash. I love my job.
William_114
Advisor

Re: CPU temperature

I'm not sure if you can get SIM to record the temps of your servers. But if your really need to monitor the temperatures and record the data you can use a free program called MRTG.

MRTG is supposed to be used to monitor router traffic, but it will monitor any SNMP value that you ask it too. MRTG can be set to record the CPU and System temp every 5 minutes, then it graphs the information for you and maintains about a years worth of data.

The following information is some of the script file that I use with MRTG to monitor the CPU temp on DL360's and DL380's. It won't make much since until you actually try using MRTG.

Target[«ServerName»-Temp1]: .1.3.6.1.4.1.232.6.2.6.8.1.4.0.2&.1.3.6.1.4.1.232.6.2.6.8.1.4.0.2:nopower@«ServerName»
MaxBytes[«ServerName»-Temp1]: 70
Options[«ServerName»-Temp1]: gauge
XSize[«ServerName»-Temp1]: 600
YSize[«ServerName»-Temp1]: 200
Title[«ServerName»-Temp1]: «ServerName» («ServerName».bepc.net): CPU Temperature
PageTop[«ServerName»-Temp1]:

«ServerName» CPU Temperature


YLegend[«ServerName»-Temp1]: Temperature
ShortLegend[«ServerName»-Temp1]: C
Legend1[«ServerName»-Temp1]: Temperature (C)
Legend2[«ServerName»-Temp1]: Temperature (C)
Legend3[«ServerName»-Temp1]: Maximum 5 minute Temperature
Legend4[«ServerName»-Temp1]: Maximum 5 minute Temperature
LegendI[«ServerName»-Temp1]:  % 
LegendO[«ServerName»-Temp1]:
ThreshMaxO[«ServerName»-Temp1]: 60
ThreshProgO[«ServerName»-Temp1]: E:\system\mrtg\bin\thresh\batchscript.bat
ThreshProgOKO[«ServerName»-Temp1]: E:\system\mrtg\bin\thresh\batchscript.bat

and all you have to do to monitor the system temp is change the .1.3.6.1.4.1.232.6.2.6.8.1.4.0.2&.1.3.6.1.4.1.232.6.2.6.8.1.4.0.2
to
.1.3.6.1.4.1.232.6.2.6.8.1.4.0.3&.1.3.6.1.4.1.232.6.2.6.8.1.4.0.3

I hope this will help
Peter Scurr_1
Occasional Visitor

Re: CPU temperature

William,
I've installed Perl and have got MRTG to build a web page based on the basic config settings. I'm not entirely sure on what to do with your script and this is my first intro to SNMP, I'm guessing that:
à «ServerNameà »-Temp1 gets replaced with my server name ie i dont need the à » characters? not sure about the -Temp bit.
Also the line "ThreshProgO[à «ServerNameà »-Temp1]: E:\system\mrtg\bin\thresh\batchscript.bat" is the batch file that runs if the temp threshold is reached?

Cheers
Pete
William_114
Advisor

Re: CPU temperature

Peter,

I hope this will help to clear things up a little.
Cristian Zanni
Occasional Visitor

Re: CPU temperature

Hi, i try to poll with those OID, and it seems that my SNMP agent dont recognize it (ErrorCode: noSuchName).

Better, any polling to and OID that represent compaq (enterprise: 232), return me that error code.
I think that the problem is my snmp agent.

Can u please tellme what snmp agent are you using?
(i have installed the windows standard snmp agent, and i already download the mibs files in my polling server)

thanks a lot.

michael zanga
Occasional Visitor

Re: CPU temperature

I had the same question and will try MRTG if anyone has had any sucess. Did anyone?
Phil Slator
Occasional Visitor

Re: CPU temperature

I haven't found a way in CIM, but we've just installed an environmental monitoring unit from APC that logs temperature and humidity...might be worth you having a look at that.

Nice and cheap.
William_114
Advisor

Re: CPU temperature

The SNMP OID values that your are trying to poll are (as far as I can tell) installed with the HP Agents. I believe you need to be at 7.10 or higher for the values to be there.
David Claypool
Honored Contributor

Re: CPU temperature

Phil's comment is very appropriate. The internal temperature at the various sensor locations CAN NOT be used to infer ambient temperature values. It is not possible to derive one from the other.

Further, the actual values of the sensor readings are MEANINGLESS except in comparison to the threshold values.
Cristian Zanni
Occasional Visitor

Re: CPU temperature

I'm using MRTG, with RRDTool.
I found some "alternative" way to check the temperature.
First, you must download a freeware called "MBM" which monitors your motherboard vars (Fans, Temperature, Voltage, etc).
And then, you may install a free SNMP agent, who acts like an extension of MBM.
It works very nice for me: i'm polling and graphing temperature, fan rpm, and voltage in my HP servers.

Here you can download MBM
http://mbm.livewiredev.com/

Here you can download SNMP agent http://www.wtcs.org/informant/mbm/overview.htm

regards,
William_114
Advisor

Re: CPU temperature

Dear King David,

Come down off your throne for a minute and listen. The original question was to be able to monitor the CPU temperature, not the ambient temperature in the room. I know that there is no way one can find the rooms ambient temperature based on the readings of the thermal probes in the server.

However, the readings for the OID values that I have specified are not "meaningless." HP's own MIB files lists the values that I am pulling as and I quote:

"This is the current temperature sensor reading in degrees
celsius.
If this value cannot be determined by software, then a value
of -1 will be returned."

The value is not some meaningless number that can only be used for comparison. It is the actual temperature is degrees Celsius, the same degrees Celsius you use to measure the temperature outside. So now that we have the temperature of the CPU, what should we compare it too? Maybe Intelâ s CPU die temperature charts. All you have to do is look at the maximum die temperature and make an inference from it. (e.g. Max die temp on a 3.00 GHz Xeon is 95°C, so my CPU running at 38°C, everything seems fine.)
Phil Slator
Occasional Visitor

Re: CPU temperature

Another tool to look at would be OPManager.

http://manageengine.adventnet.com/products/opmanager/

This can monitor all sorts of hardware. It can be configured to log the CPU temperatures from HP's MIB.


The basic product's about $900, but there's a free version that you can use on a limited number of servers.

(I've also been looking at ways to monitor a dodgy air conditioning unit).
David Claypool
Honored Contributor

Re: CPU temperature

William:

Sorry, it doesn't work that way. The Intel chart on max temperature is of course used by the development engineers, but the actual sensor location needs to be taken into account. That's why each individual server has individual thresholds for each sensor.

In other words, if max temp in the Intel chart says 48C, you might actually see that the threshold on the sensor in the ProLiant for CPU1 is set to 38C and for CPU2 is 42C. The position of the sensor is picking up radiated thermal energy and other factors such as airflow past that sensor will affect what its reading is.

If the sensor ever were to detect 48C, you would probably never know about it, because by that time the die has gotten to 70C and the system has been smoked.

These are not actual numbers, but used for illustration.

A whitebox without the ProLiant Advanced System Management Controller may benefit from wasting network bandwidth to poll temperature, but ProLiant servers are self-managing.
William_114
Advisor

Re: CPU temperature

These values are the exact same values that the HP agents are already polling. You can find these values under the environment section of the system management home page. You can even see the thresholds there. Doesnâ t matter if it is only that ambient right near the CPU it is the temperature HP has chosen to represent the CPU temperature. While each individual server has different thresholds, they are the same between like models. I agree there is no absolute threshold, or exact reading.

Threshold or values, it makes no difference because as I am trying to point out, they are simply looking for a way to track these values.
David Claypool
Honored Contributor

Re: CPU temperature

You are making my point. However, the agents only have these values because they are read from the Advanced Systems Management Controller. HP SIM doesn't have them. And you are wrong about same models--this varies with processor, type of drive cage, PCI-X vs PCI-e.

My point is that if you go out and collect values and display them, they are only relevant to the threshold.

It still comes down to this: why bother gathering and graphing meaningless data? ProLiants are self-managing.

You should probably read this document: http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=PSD_CN0416W
William_114
Advisor

Re: CPU temperature

See Attachment
David Claypool
Honored Contributor

Re: CPU temperature

Where's the threshold on your graph? What happens when it spikes up and gets everyone in a tizzy but it's still below the threshold?

You also have a choice with ASR--it does not have to perform a system shutdown. It can simply do a notification.

You say you don't want to be paged in the middle of the night. In your scenario someone has to be told to turn the A/C up over night--how are they informed?

Okay, enough of this flame war.
Bart_38
Occasional Advisor

Re: CPU temperature

Why is it "useless information".

I am now hunting for such a program to measure the temp of my server processorors to test is a certain rack mount cooling system does its job.

So want to measure if "today", switch the thing on and measure "tomorrow" and see the difference...
healermax
Occasional Visitor

Re: CPU temperature

Hi,

I do face this issue to monitor temperature. My server is Proliant DL560 Gen9. I've tried the above configuration but still no luck.. Anyone can help me please? Thanks.

 

Andrew_Haak
Honored Contributor

Re: CPU temperature

You can use Cacti or any like program to read the SNMP value for the temp of the servers CPU. If you can most have a treshshold to get an allert when the reading exeeds a certain value. SIm ony tells you if a treshhold  is broken. This treshhold is determined in the software and as far a i know can't be set by hand.

 

 

Kind regards,

Andrew