Operating System - HP-UX
1752781 Members
5961 Online
108789 Solutions
New Discussion юеВ

Re: HW Event Notification

 
SOLVED
Go to solution
Doug_3
Frequent Advisor

HW Event Notification

Hello, I am interested if we can obtain a greater level of detail in our hp-ux EMS hardware notification for temperatures. We page/email when the temp reaches the default setting but the msg is too generic (see below).

Can someone point me to the correct config file where the settings are maintained or let me know if these are hardcoded and we are not able to gain more info from the standard EMS processes.

We are looking for the actual cabinet temp at the time of the notification, what the shutdown temp is, etc.

Thanks in advance,
Doug

Event Time..........: Thu May 31 20:12:26 2007
Severity............: CRITICAL
Monitor.............: dm_core_hw
Event #.............: 33
System..............: IFASHP.spokaneschools.org

Summary:
Processor cabinet intake temperature is too hot
14 REPLIES 14
A. Clay Stephenson
Acclaimed Contributor

Re: HW Event Notification

I cringe every time I see a question like yours because the real answer is to fix the problem --- inadequate cooling. Some models do allow querying the temperature but the vast majority can only issue fixed status messages. Moreover, there is no general approach that will work across all models AND you are ignoring other components such as disk arrays, network switches, and other peripherals.

There is an instrument that is designed to do this task; I think it is called a "thermometer".
You can find digital thermometers that either have a serial interface and some are even web-enabled. Most will allow for multiple temperature probes. This is the approach I would take and then you can remotely measure temperature regardless of the equipment.

Of course, the real answer is to have adequate N + 1 cooling so that you can lose an entire HVAC unit and your equipment doesn't fail.

If it ain't broke, I can fix that.
Scot Bean
Honored Contributor

Re: HW Event Notification

The overtemp settings are generally hard coded in the machine firmware. I would not recommend trying to change them, unless you want to fry your box.

If you tell us the model this is, someone could maybe find a spec.

You can also see a bit more detail via the console interface to the firware/support processor (cntl-B) via the 'PS' (power status) command. It tells you which threshold you are at.

Event #33 is the first warning threshhold. If you get even hotter the machine should shut itself off.
Doug_3
Frequent Advisor

Re: HW Event Notification

Thanks, but that was not what I was asking. I want to know what the internal chassis temp is set to when STM/EMS generates an event triggering whatever actions we have set in the configuration. I also want to know if the temp reading is hard coded or if we can include that in the EMS notification.

We do have N+1 cooling and we have temp gauges on other hardware as well as HVAC notifications.

Thanks anyways.
OldSchool
Honored Contributor

Re: HW Event Notification

the firmware / hardware *generates* the event AFAIK. EMS simply reports it. There is nothing to / can be configured

man 1m dm_core_hw for more
Scot Bean
Honored Contributor

Re: HW Event Notification

If you share with us the model of the machine, someone may be able to look up the specs.
Doug_3
Frequent Advisor

Re: HW Event Notification

Thank you,
rp7400 A3639C
Scot Bean
Honored Contributor
Solution

Re: HW Event Notification

Looks like the specs for rp7400 are warning at 35C, shutdown (ungraceful) at 40C.

Of course these temps are inside the cabinet, NOT the computer room air. Also, these temps are probably measured at +/- 2 degrees C or so, they can vary.
Bill Hassell
Honored Contributor

Re: HW Event Notification

The computer hardware has a two stage thermometer: too warm (warning) and way too hot (critical). There is no thermometer, no readout, nothing but these two levels. Your computer may have shut itself down but everything else is frying. Your computer room is way, way too hot if either message is reported -- and if no one is within a few minutes of the computer room so they can hit the panic button to shutdown all power to the room, damage has already occurred. Proper temperature control seems to be a very low priority until it is too late.

I have personally witnessed over $100,000 in damage when an air conditioner (just one) was turned off by a timer Sunday afternoon and the internal temperature went to an estimated 140 degrees. Four disk drives were destroyed, a tape drive, all the networking and several computers without overtemp shutdown were damaged beyond reliable repair. This company ignored requests for separate, dual air conditioners and instead spliced some ductwork off the building system into the computer room. The $100k was just hardware -- downtime was several weeks.


Bill Hassell, sysadmin
Paul Clark_9
Advisor

Re: HW Event Notification

Scot Bean,

Would it be possible to find out the temp thresholds for an rp8420? We have a similar problem occuring at the moment and need to understand this a bit more.

Regards