HPE 9000 and HPE e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

rp2470 temperature env. settings

 
Nick D'Angelo
Super Advisor

rp2470 temperature env. settings

Hello, I have an RP2470, with HPUX11i.

we are having problems with one of our AC units in the server room and last week, the machine automatically shut itself down due to overheat alert, which is great.

however, I am trying to determine what the current temperature of the machine is (if possible) and also what are the thresholds that it determines when to shut itself down.

Any suggestions?
Always learning
13 REPLIES 13
Sanjay_6
Honored Contributor

Re: rp2470 temperature env. settings

Hi Nick,

Try this link from itrc,

http://www1.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000066027971

The itrc doc id is UCMDSKBRC00010338.

Hope this helps.

Regds


RAC_1
Honored Contributor

Re: rp2470 temperature env. settings

There is no way to determine the what the temp. of the system is. (Unless you open it and use thermometer!!)

The /etc/envd.conf determines the actions that system will take when threshholds are reached. The hardware guide should give you the required details.

Anil
There is no substitute to HARDWORK
Pete Randall
Outstanding Contributor

Re: rp2470 temperature env. settings

In a similar situation a couple of years ago, I also posted a question here on the Forums. Perhaps you'll find some value here:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=205033


Pete

Pete
A. Clay Stephenson
Acclaimed Contributor

Re: rp2470 temperature env. settings

After many years of trying to do this, I have learned that computers are really lousy at measuring temperature --- and very non-portable. Techniques for one model often don't work for another. However, I have found a device that does this task really well; it's called a thermometer. You should really get yourself an inexpensive digital thermometer with a serial or network port. You write one script or program to periodically poll for temperature readings and you are done. The one I use is manufactured by ExTech; it has two thermocouple inputs and a serial output. I've used it for many years and haven't had to change the program even when hardware or vendors were replaced. My temperature readings are constantly monitored and report to OpenView Operations. This is really the way to do it because as soon as you find a means of measuring temperature for one model (some HP9000's can do this), it doesn't work for another.
If it ain't broke, I can fix that.
Pete Randall
Outstanding Contributor

Re: rp2470 temperature env. settings

Also, check to see if your UPS is capable of measuring and reporting temperature. We have an APC Symmetra which does this.


Pete

Pete
Patrick Wallek
Honored Contributor

Re: rp2470 temperature env. settings

I have to agree with Clay. Nothing beats monitoring the temperature in the computer room yourself. If it gets to a certain point, you get alerted and YOU can then shut down the machines if necessary.

I have recently purchased several units from AKCP, Inc.

http://www.akcp.com

The unit I use is their SensorProbe 2. It has 2 inputs and they have a good selection of sensors to use with the unit. I use the combination temperature/humidity sensor. The sensors look like a regular CAT5 network cable so they are easy to run and they come in varying lengths so you can run them just about anywhere you want.

You can program the unit to send e-mail messages when certain thresholds are met. The unit can also send SNMP traps.

This unit is small and relatively inexpensive.

More details on the SensorProve 2 here:
http://www.akcp.com/company/sensorProbe2.htm

Available Sensors:
http://www.akcp.com/company/intelligentsensors.htm
Bill Hassell
Honored Contributor

Re: rp2470 temperature env. settings

And more to the point about temperature and computers: The RP shut itself down but your externmal disks, printers, tape drives, network equipment, etc all fried in the AC disaster. I call AC failure a disaster because anything over 95-100 deg F (35-37 deg C) causes irreversable damage to electronics. The tape drive may not have melted but it is now unreliable, something you never want in a backup. Same with external disk drives. Same with routers and switches. I have replaced many tape and disk drives (and computers) over the years due to AC failures.

Monitoring the temperature is only for findinbg blame. The fact that the AC failed and you have just had to scrap a million dollars in equipment is not a solution. Using computers to monitor the temperature, while elegant, is fraught with danger. You can safely assume that by the time the computer senses the temp rise, sends an email to the pager company and it gets to your pager, it's far too late to run to the data center and shutdown the equipment. Most IT centers will go over 100 deg F in less than 10-20 minutes with a total AC failure.

The best solution is to use a thermostatically controlled circuit breaker for the data center. This breaker is mechanical (no computer and network problems) and removes all power from everything once an overtemp is sensed. You can set this to something higher, say 110 or 120 deg F since it will take several minutes for the the equipment to follow this kind of a jump. The breaker yanks the power--no shutdown at all. But at least the hardware is protected.

For a more sophisticated system, add some auto-shutdown tools (keep the mechanical breaker) which start the shutdown process. Note that every opsystem will need a different mechanism to trigger the shutdown.


Bill Hassell, sysadmin
A. Clay Stephenson
Acclaimed Contributor

Re: rp2470 temperature env. settings

Of course, the best approach is to be able to tolerate the complete failure of one HVAC unit. This means that you need N + 1 units where N units will completely handle the thermal load. I simply can't imagine setting up a data center any other way. Each of these units should be able to at least operate a set of relay contacts when an alarm condition (such as shutdown, low pressure, over pressure, etc.) develops and of course, OV/O should be monitoring this as well. Having one more HVAC unit than you need also makes it possible to do maintenance on
the units.

One of the common state of the art stupid data center design mistakes is saving money on the backup generator so that there is not enough capacity to run the AC units eventhough the computer eqipment will run just fie and dandy - until it melts.
I've seen this happen more times than you would believe.
If it ain't broke, I can fix that.
Bill Hassell
Honored Contributor

Re: rp2470 temperature env. settings

And just to add to the list of design failures: a data center has 3 redundant AC units that are designed to kick in as needed. Any one is capable of handling the full heat load. However, an emergency was created (on a Saturday naturally) when the building exterior was painted and the painters turned off the AC breakers so they get behind behind the units. Luckily, there was an overtemp sensor patched into the building's alarm system and IT was called. The battery backup and generator were useless as the breakers were between the portected power source and the AC units.


Bill Hassell, sysadmin
timmy b.
Honored Contributor

Re: rp2470 temperature env. settings

One side suggestion: Test everything periodically.

I went onsite to upgrade an rp7410 on a Saturday morning at 7am. As the IT manager opened the door into the data center, we were greeted with a blast of HOT air. Their one, single CRAC had gone down around 1am. The HP systems had alarmed, shut down the apps and OS, and then shut down the systems as the temp increased. Only 18 minutes elapsed between the failure of the CRAC and the hardware poweroff. The customer has a temp monitoring/alert device and it was functioning just fine. But it never paged anyone, because it was configured wrong, and had never been tested!!! It was trying to dial a 9 for an outside line, while it was connected to a direct-out line.

For what it's worth, when we opened the door at 7am, it was over 130 degrees in the room. Under normal circumstances nobody would have entered the room until Monday morning, 48 hours later, so it's a good thing we had this upgrade scheduled even though we had to postpone!

After the room was re-stabilized at a normal temp the systems were all brought back up. No failures of HP hardware!

The customer has installed an additional CRAC.
There are 10 kinds of people in this world: Those who understand Binary, and those who don't.
Nick D'Angelo
Super Advisor

Re: rp2470 temperature env. settings

Thank you for all your ideas/views.

One more quick question, I see that for the RP2470, the Max operating range is 5C to 39C or 41F to 102F.

Am I correct to assume that the machine likely hit 39C before it shutdown?

I agree that this is dangerously high, while we wait for an additional AC unit to be installed.

I don't suspect that these parameters are 'tunable' are they?

Thanks,

Nick
Always learning
Bill Hassell
Honored Contributor

Re: rp2470 temperature env. settings

Yes, you can safely assume that the machine hit 39C. This isn't a dangerous situation, it is a damaging situation. Your machine, even it is running at 38 C and hasn't shutdown, is being damaged.AM and processor components are being heat stressed, just like an automobile with a broken thermostat. Just as oil breaks down and components begin to shed metal particles, semiconductors begin to change physical and chemical characteristics which affects their electrical properties. These aren't reversible. So the longer it remains over about 25-27 C, the more damage that occurs.

Tunable shutdown values? Nope, not possible. But if you can't stay in the computer room all day because it's too hot for people, then it is way too hot for computers. And while the RP2470 is shutting itself off, all other equipment in your room is being damaged too. Let it go for a few more days and everything will start crashing.


Bill Hassell, sysadmin
Torsten.
Acclaimed Contributor

Re: rp2470 temperature env. settings

if the problem still appears, open a call at hp response center. They should check the fan controllers, which could be failed and reporting wrong values.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!