1754015 Members
7582 Online
108811 Solutions
New Discussion юеВ

EMS question

 
SOLVED
Go to solution
Waqar Razi
Regular Advisor

EMS question

We have rp4440 server and I have got some ems error messages which says ia64. I am just curious if we have the wrong ems installed on the server.

# model
9000/800/rp4440

Jul 24 01:18:06 dswdhpt1 EMS [2812]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/system/events/ia64_corehw/core_hw" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 184287234 -r /system/events/ia64_corehw/core_hw -n 184287261 -a

Event Time..........: Fri Jul 24 01:18:06 2009
Severity............: CRITICAL
Monitor.............: ia64_corehw
Event #.............: 104011
System..............: dswdhpt1

Can any one please give me some clue?


8 REPLIES 8
Mel Burslan
Honored Contributor

Re: EMS question

did you run the mentioned command ?

/opt/resmon/bin/resdata -R 184287234 -r /system/events/ia64_corehw/core_hw -n 184287261 -a

Please post the output if you can not figure out the failing product yourself.
________________________________
UNIX because I majored in cryptology...
Waqar Razi
Regular Advisor

Re: EMS question

It gives me some problems about the power supply. I have two questions here:

1- Do we have the correct EMS installed on this server as this is rp server and the ems monitor says ia64_corehw?

2- The monitor is saying that one of the power supplies may have failed. I have question in this regard: Does rp4440 has more than one power supplies or it just has one. I am asking you this because this server is remotely hosted and we dont have access to console or mp.

Here is the output of the command:

CURRENT MONITOR DATA:

Event Time..........: Fri Jul 24 01:18:06 2009
Severity............: CRITICAL
Monitor.............: ia64_corehw
Event #.............: 104011
System..............: dswdhpt1

Summary:
Power Unit : Redundancy lost or not present.


Description of Error:

The number of Power supplies has gone from N+1 (redundant) to N
(non-redundant) if a Power supply was removed, or the number of Power
supplies is < N+1.

Probable Cause / Recommended Action:

The minimum number of power supplies required to power the unit is
currently installed and operating. There are no redundant I/O power
supplies available in case of failure. If redundancy is desired another
Power supply should be added.

For information on the sensor that generated this event, refer to FRU ID
in Event Details section.

Additional Event Data:
System IP Address...: 139.177.210.48
System IP Address...: 10.7.221.226
Event Id............: 0x4a696e3e00000000
Monitor Version.....: B.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_ia64_corehw.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/rp4440
EMS Version.....................: A.04.20
STM Version.....................: A.59.00
System Serial Number............: USE44178JT
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/ia64_corehw.htm#104011

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v


Event Details :

Event Date .............: Mon May 11 12:39:57 2009
Sensor Number ..........: 0xcf
Sensor Type ............: Power Unit
Sensor Class ...........: Sensor specific
Sensor Reading/Offset...: 0x02 (Sensor Reading)
Event Type.............: Not Applicable
Entity ID ..............: 21
Generic Message.........:
Power Unit : Power cycle
Entity FRU Id Info......:
power management / power distribution board (Sensor ID: Power Converter)

Patrick Wallek
Honored Contributor

Re: EMS question

Yes, the rp4440 can have multiple power supplies.

To check the status of your PSUs, you should log into the MP for this server, go to CM at the menu, and then enter PS to see the power supply and fan status.

Here is the output from one of my rp4440s:


[hqcdb03] MP:CM> ps


PS
System Power state: On
Temperature : Normal


Power supplies State
-------------------------------------
Power Supply 0 Normal
Power Supply 1 Normal


Fans State
-------------------------------------
Fan 0 (System) Normal
Fan 1 (System) Normal
Fan 2 (Pwr) Normal


[hqcdb03] MP:CM>
Waqar Razi
Regular Advisor

Re: EMS question

I dont have access to the MP, this server is remotely located and I dont have the MP address. I have checked the mstm but it is showing no problem with the power supply.

My other question remains unanswered:

The EMS Monitor says ia64_corehw, is that normal with RP servers or I have the wrong version of EMS monitor installed here.
Mel Burslan
Honored Contributor

Re: EMS question

Don't take this as an authoritative answer but I believe, what you are seeing is standard boiler-plate message, having nothing to do directly with your processor architecture. I just checked my rp3440 and found similar messages in the syslog.
________________________________
UNIX because I majored in cryptology...
Waqar Razi
Regular Advisor

Re: EMS question

Yes thats why I am asking if the EMS version is correct for this server. Because in EMS Monitor it say "Monitor.............: ia64_corehw", is it possible that we have EMS for RX or itanium servers on this RP4440.

Matti_Kurkela
Honored Contributor

Re: EMS question

I think the rp34xx and rp44xx series servers were designed as dual-architecture: by changing the firmware, removing the PA-RISC CPUs (and maybe their power converters) and installing Itaniums in their place the servers would have been convertible to rx26xx and rx46xx models, respectively.

If the hardware developers regarded ia64 as the "primary" architecture of these models, it may have influenced the naming of the hardware monitoring EMS components.

And the software compatibility between PA-RISC and Itanium works only one way: the Itanium servers can run PA-RISC software (with the built-in Aries emulator), but a PA-RISC software absolutely cannot run any Itanium binaries at all.

So if someone somehow found a way to fool swinstall to install the Itanium EMS binaries from the installation depots instead of PA-RISC ones, your EMS would not work at all.

MK
MK
Shinji Teragaito_1
Respected Contributor
Solution

Re: EMS question

Hi,

I guess OnlineDiag B.11.11.20.03 for HP9000 is installed on your
rp4440 server.

The monitor name must confuse you. Please refer to the very old
Release Notes when ia64_corehw was introduced to 11.11:

------------------------------
Release Notes for EMS Hardware Monitors on HP-UX 11i (December 2003)
http://docs.hp.com/en/diag/archive/emr_0312_11i.htm

* Core Hardware Monitor for Itanium (ia64_corehw).

o This is the initial release of the monitor on HP-UX 11.11.
..(snip)..
In addition, the monitor will also receive PCI error data, and
make it available to the FPL monitor. The FPL monitor will then
generate appropriate EMS events. The PCI error-data processing
will be available on IPMI-based, as well as non-IPMI based, HP-PA
systems.

The ia64_core_hw monitor also monitors fans and power supplies in
non-Cellular PA and Cellular PA systems.
------------------------------

According to the newer Release Notes for EMS Hardware Monitors for
HP-UX 11.11, ia64_corehw for PA-RISC has been called as "Chassis
Event Monitor".

Chassis Code Monitor (dm_chassis) and Core Hardware Monitor
(dm_core_hw) are present for PA-RISC:

------------------------------
Release Notes for EMS Hardware Monitors (June 2001)
http://docs.hp.com/en/diag/archive/emr_0106.htm

* Chassis Code Monitor (dm_chassis)

New monitor. The Chassis Code Monitor supports Superdome family
systems on HP-UX 11i. Each chassis code delivered to the GSP is read
by the chassis code monitor, which looks the chassis code up in an
internal table built from a chassis code database. If the chassis
code warrants an event, it generates an EMS event with summary,
event/keyword description, and details text.
------------------------------
Release Notes for EMS Hardware Monitors (June 1999)
http://docs.hp.com/en/diag/archive/emr_9906.htm

New monitor: Core Hardware Monitor. Monitors core hardware (hardware
within the SPU cabinet), for example, intake temperature. On some
systems, other hardware such as power supplies are monitored.
------------------------------

Hope this wipe out your concern