Re: Over temperature alarms handling & supported platforms

Rui Vilao · ‎05-28-2003

Greetings,

I had a look at some previous posts on this subject and could learn a lot!
Great reference!

However I am still missing two points.

1. On which HW platforms is this overtemp warning supported? Our customer
has RP7405 and RP5405.

2. Is it possible to trigger some command when the temperature is back to nornal?

I have tried the following entry in /etc/envd.conf

TEMP_NORMAL:y
/opt/ASG/normal.sh

... but it does not work when I test it with:

# echo '\000\c' > /var/run/envd_diag

Any help/suggestion is highly appreciated.

TIA.

Kind Regards,

Rui.

"We should never stop learning"_________ rui.vilao@rocketmail.com

Paula J Frazer-Campbell · ‎05-28-2003

Hi

You can utilise the /etc/envd.conf file to warn you of Overtemps.

THe /sysadmin/team file just contails an information line.

Path should be in full:-

OVERTEMP_CRIT:y
/usr/bin/mailx -s "WARNING SERVER GETTING WARM" 07956610410@one2one.net
OVERTEMP_EMERG:y
/usr/sbin/reboot -qh

FANFAIL_CRIT:y
/usr/bin/mailx -s "WARNING SERVER FAN PROBLEMS" 07956610410@one2one.net
FANFAIL_EMERG:y
/usr/sbin/reboot -qh

The file /sysadmin/team just contains a frew line of info txt

Paula

If you can spell SysAdmin then you is one - anon

Zeev Schultz · ‎05-28-2003

1)Its hardware platform dependant,L-class (rp54xx) and N-class (rp74xx) are capable of
this ability.
2)Not by usual means.It reports to syslog (And whatever defined in envd.conf) but doesn't repot when status is back to normal.
However...

NORMAL 0x0 = 000
OVERTEMP_CRIT 0x1 = 001
OVERTEMP_EMERG 0x2 = 002
FANFAIL_CRIT 0x4 = 004
FANFAIL_EMERG 0x5 = 005

To simulate:
OVERTEMP_CRIT event:
# echo '\001\c' > /var/run/envd_diag

After any tests, set the condition back to NORMAL:
TEMP_NORMAL event:
# echo '\000\c' > /var/run/envd_diag

envd_diag is a fifo file (works like stdout?)
So I'd do traces for envd (use tusc -vpf) and
learn its behaviour,but it's hardly a workaround :)

Zeev

So computers don't think yet. At least not chess computers. - Seymour Cray

Zeev Schultz · ‎05-28-2003

1)Its hardware platform dependant,L-class (rp54xx) and N-class (rp74xx) are capable of
this ability.
2)Not by usual means.It reports to syslog (And whatever defined in envd.conf) but doesn't repot when status is back to normal.
However...

NORMAL 0x0 = 000
OVERTEMP_CRIT 0x1 = 001
OVERTEMP_EMERG 0x2 = 002
FANFAIL_CRIT 0x4 = 004
FANFAIL_EMERG 0x5 = 005

To simulate:
OVERTEMP_CRIT event:
# echo '\001\c' > /var/run/envd_diag

After any tests, set the condition back to NORMAL:
TEMP_NORMAL event:
# echo '\000\c' > /var/run/envd_diag

envd_diag is a fifo file (works like stdout?)
So I'd do traces for envd (use tusc -vpf) and
learn its behaviour,but it's hardly a workaround :)

Zeev

So computers don't think yet. At least not chess computers. - Seymour Cray

Rui Vilao · ‎05-29-2003

Many thanks to Paula & Zeev.

If I understand well it is not possible to run some script defined in envd.conf when the temperature condition is back to normal...

TIA,

Kind Regards,

Rui.

"We should never stop learning"_________ rui.vilao@rocketmail.com

Tobias Hartlieb_1 · ‎05-30-2003

Hi,

I'm not sure, but maybe there is a workaround. I have never tested, but maybe you would like to.
These events (temperature hot, temperature back to normal) are usually also detected (I think via envd ..) and reported by the EMS (Event Monitoring System, part of the OnlineDiagnostic, free of charge Software).
Within EMS, the dm_core_hw monitor is responsible for the temperature (as well as fan states, etc..).
To my knowledge, the "back to normal temperrature" event is of "informational" severity only, and thus gets only written by default to /var/opt/resmon/log/event.log However, by using /etc/opt/resmon/lbin/monconfig, one could trigger dm_core_hw to additionally send "informational" events to another textlog file, or to any Email adress...
Checkout http://www.docs.hp.com/hpux/diag/index.html#EMS%20Hardware%20Monitors%20(for%20HP%209000) for additional Infos on EMS.

Regards.

Tobias

Bill Hassell · ‎06-01-2003

I would be VERY concerned about using the computers in an overtemp condition. While the rp-class computers do indeed have an overtemp sensor, this is for emergencies. When an overtemp condition exists, the rest of the system (disks, tape drives, network devices, etc) may already be damaged. The rp machines will turn themselves off before they are damaged but you may lose disks and other peripherals. The problem with heat damage is that it is cumulative, that is, reliability and intermittant failures begin to increase leading to loss of data and/or system crashes.

The cost of adequate air-conditioning AND overtemp power disconnect is insiginificant when compared to the downtime, troubleshooting and repair of equipment damaged by overtemp conditions. Using the computer as a substitute for a hightemp power disconnect is not a good idea.

Bill Hassell, sysadmin

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Over temperature alarms handling & supported platforms

Over temperature alarms handling & supported platforms