Operating System - HP-UX
1834608 Members
3325 Online
110069 Solutions
New Discussion

Re: Event Monitor Error Messages

 
SOLVED
Go to solution
Andrew Dutton
Frequent Advisor

Event Monitor Error Messages


Daily I am recieving the below email alert from my new 11.11 system. I have been working with my HW contract vendor on this. They suggest that there is no real event, that this event happened a while ago and just is erroneously still being report. They suggested I zero out the resmon logs to clear it, which seems a little fishy to me. I have cleared all system logs from the console, both types. There are not warning lights on the system, nor the power supplies either. There really is no indication of an issue anymore. I thought i should post before just /dev/nulling out the files... seems a little wierd to do.
thanks
drew

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Wed Dec 19 08:23:00 2007

isqpro05 sent Event Monitor notification information:

/system/events/ia64_corehw/core_hw is >= 3.
Its current value is CRITICAL(5).



Event data from monitor:

Event Time..........: Wed Dec 19 08:23:00 2007
Severity............: CRITICAL
Monitor.............: ia64_corehw
Event #.............: 104011
System..............: nameofserver

Summary:
Power Unit : Redundancy lost or not present.


Description of Error:

The number of Power supplies has gone from N+1 (redundant) to N
(non-redundant) if a Power supply was removed, or the number of Power
supplies is < N+1.

Probable Cause / Recommended Action:

The minimum number of power supplies required to power the unit is
currently installed and operating. There are no redundant I/O power
supplies available in case of failure. If redundancy is desired another
Power supply should be added.

For information on the sensor that generated this event, refer to FRU ID
in Event Details section.

Additional Event Data:
System IP Address...: 10.7.5.56
Event Id............: 0x4769456400000000
Monitor Version.....: B.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_ia64_corehw.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/rp4440
EMS Version.....................: A.04.20
STM Version.....................: A.57.00
System Serial Number............: USEREMOVED
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/ia64_corehw.htm#104011

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v


Event Details :

Event Date .............: Mon Nov 26 07:49:48 2007
Sensor Number ..........: 0xcf
Sensor Type ............: Power Unit
Sensor Class ...........: Sensor specific
Sensor Reading/Offset...: 0x02 (Sensor Reading)
Event Type.............: Not Applicable
Entity ID ..............: 21
Generic Message.........:
Power Unit : Power cycle
Entity FRU Id Info......:
power management / power distribution board (Sensor ID: Power Converter)



>---------- End Event Monitoring Service Event Notification ----------<
12 REPLIES 12
Hasan  Atasoy
Honored Contributor
Solution

Re: Event Monitor Error Messages

hi andrew ;
from the mp
cm -> ps


look at the power supplies.

Hasan.
Steven E. Protter
Exalted Contributor

Re: Event Monitor Error Messages

Shalom,

It could be a bad sensor.

Depending on the hardware type, there are redundant power supplies and the system can operate without one of them.

I would suggest the message is real and a power supply needs to be swapped out.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Andrew Dutton
Frequent Advisor

Re: Event Monitor Error Messages

Looks good...

[mp00306ed616f5] MP:CM> ps


PS
System Power state: On
Temperature : Status is not available

Power supplies State
-----------------------------------------------------------
Power Supply 0 Normal
Power Supply 1 Normal


Fans State
-----------------------------------------------------------
Fan 0 (System) Normal
Fan 1 (System) Normal
Fan 2 (Pwr) Normal

Patrick Wallek
Honored Contributor

Re: Event Monitor Error Messages

Run the command:

# /etc/opt/resmon/lbin/set_fixed -L

This will list everything that EMS considers to be down/bad. There will be 2 columns of output. The left is the device and the right is the status.

If something shows down, try running

# /etc/opt/resmon/lbin/set_fixed -n

or if you want to reset everything:

# /etc/opt/resmon/lbin/set_fixed -n \*

Now wait and see if you still get the message tomorrow.
Tim Nelson
Honored Contributor

Re: Event Monitor Error Messages

Might be worth updating Online Diags ( includes EMS ) to the latest rev and patches if you are not already there.

1 of 2 things happening.

1) you actually have a bad PS that EMS is reacting to

2) bad software reporting a ghost event





Andrew Dutton
Frequent Advisor

Re: Event Monitor Error Messages

Looks like nothing is at fault...

servername(PROD EB21):/# /etc/opt/resmon/lbin/set_fixed -L
No resources set to DOWN state.
servername(PROD EB21):/# /etc/opt/resmon/lbin/set_fixed -n \*
/etc/opt/resmon/lbin/set_fixed: No matching resource names found to set to UP state.
servername(PROD EB21):/#
Andrew Dutton
Frequent Advisor

Re: Event Monitor Error Messages

Any one know if I need to reboot if I patch it? or can I do it on a live production system?
Tim Nelson
Honored Contributor

Re: Event Monitor Error Messages

Diags typically do not need a reboot.

Check the description once you get it downloaded if you wish to confirm.
Andres_13
Respected Contributor

Re: Event Monitor Error Messages

I did have same problem later the EMS was reporting ghost events... until upgrade to the lastest support tolls manager version C.56.00.

Regards!
Andres_13
Respected Contributor

Re: Event Monitor Error Messages

I did have same problem later the EMS was reporting ghost events... until upgrade to the lastest support tools manager version C.56.00.

Regards!
Torsten.
Acclaimed Contributor

Re: Event Monitor Error Messages

Version C.56.00 is for 11.23, but you are running 11.11.

I would suggest to review the MP event log for power/fan related messages and check the servers firmware

MP:CM> sysrev

latest versions is
PDC 46.34, BMC 04.06, and iLO MP firmware E.03.30

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=2512015&swItem=pf-55290-1&prodNameId=401769&swEnvOID=54&swLang=13&taskId=135&mode=4&idx=0


Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Andrew Merritt_2
Honored Contributor

Re: Event Monitor Error Messages

Hi Andrew,
I would suggest opening a support call with HP for this; this problem is under investigation by them, and data from your systems may help.

It may also be possible to stop the messages by removing the file /var/stm/logs/monitor/remindersel.dat. This is completely safe to do; it will be recreated by the monitor.

Andrew