Operating System - HP-UX
1748169 Members
4197 Online
108758 Solutions
New Discussion юеВ

Event Monitor Notification - only a vague idea

 
jasonK_1
Frequent Advisor

Event Monitor Notification - only a vague idea

I got the following email notification and don't know what cause it. It's the 12H autoraid


Notification Time: Fri May 29 15:23:24 2009

hnrgnw sent Event Monitor notification information:

/storage/events/disk_arrays/AutoRAID/0000003FEB3E
is >= 3.
Its current value is SERIOUS(4).



Event data from monitor:

Event Time..........: Fri May 29 15:23:24 2009
Severity............: SERIOUS
Monitor.............: armmon
Event #.............: 101
System..............: hnrgnw

Summary:
Disk Array at hardware path : Device removed from monitoring


Description of Error:

The device has been removed from the list of devices being monitored by
this monitor.

Probable Cause / Recommended Action:

The device was removed from the system, has stopped responding to the
system or it has been replaced with a device that is not supported by this
monitor.
Run ioscan to determine the state and type of the device.
Check the /var/stm/data/os_decode_xref for the information indicating
which devices are supported by this monitor.
Check other monitors to determine if they are now monitoring the
device by running /etc/opt/resmon/lbin monconfig and the using the
"Check monitoring" command.

Additional Event Data:
System IP Address...: 205.135.30.74
Event Id............: 0x4a20605c00000000
Monitor Version.....: B.01.01
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_armmon.clcfg
Client Configuration File Version...: A.01.01
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/L2000-44
EMS Version.....................: A.03.20
STM Version.....................: A.44.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/armmon.htm#101

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v

15 REPLIES 15
cnb
Honored Contributor

Re: Event Monitor Notification - only a vague idea

Hi Jason,

Follow the link and you get:

Event 101

Severity: Critical
Event Summary: Disk at hardware path x/xx/x.x.x : Device removed from monitoring
Event Class: I/O
Problem Description:
The device has been removed from the list of devices being monitored by this monitor.
Probable Cause / Recommended Action
The device was removed from the system, has stopped responding to the system or it has been replaced with a device that is not supported by this monitor.

***Run ioscan to determine the state and type of the device.
*** Check the /var/stm/data/os_decode_xref for the information indicating which devices are supported by this monitor.
*** Check other monitors to determine if they are now monitoring the device by running /etc/opt/resmon/lbin/monconfig and using the "Check monitoring" command.
Automated Recovery: None
Event Generation Threshold: 1 occurrence


Any UNCLAIMED or NO_HW changes noticed from ioscan?

Might want to check the STM log for other issues also.

Could be a number of things. Start with the array subsystem hardware and all system logs.

Warning on the 12H?

hth,
Basheer_2
Trusted Contributor

Re: Event Monitor Notification - only a vague idea

look in /var/adm/syslog/syslog.log
adn you will find a command that you can run to get this information.

Since this is serious and >3 urgent action is needed.

Go to your SAN and look for RED LIGHTS.

Depending on your SAN ( VA,EVA,XP)
you can hot replace a disk.
OldSchool
Honored Contributor

Re: Event Monitor Notification - only a vague idea

I believe you'll find a dead disk in the AutoRaid.


regarding:
"Go to your SAN and look for RED LIGHTS.
Depending on your SAN ( VA,EVA,XP)
you can hot replace a disk."

On AutoRaid, I don't think you'll find "red lights". you should be able to look at the status via sam. Also the Admin guide indicates commands to list array logs, which may show something of interest.


Admin Guide here:
http://h10032.www1.hp.com/ctg/Manual/lpg28365.pdf

Other resources at:
http://h10025.www1.hp.com/ewfrf/wc/manualCategory?dlc=en&lc=en&product=61936&cc=us&

Drive(s) should be hot-swappable, so once you locate the culprit and obtain spares, it should be an easy fix.
jasonK_1
Frequent Advisor

Re: Event Monitor Notification - only a vague idea

Hi All,

1. The system is still up and running
2. no unclaimed HW
3. no missing disk
4. no warning message when run arraydsp -a AUTORAID1
5. no further notification email since May 29

Should I let it go or just keep an eye on it?

cnb
Honored Contributor

Re: Event Monitor Notification - only a vague idea

I would at least check/restart monitoring the array again, as this message indicates that it is no longer being monitored.

Was this the only message in the event log?

hth,
jasonK_1
Frequent Advisor

Re: Event Monitor Notification - only a vague idea

cnb,

The 12H autoraid is still being monitored
cnb
Honored Contributor

Re: Event Monitor Notification - only a vague idea

Hi Jason,

You might want to run this and check the results for any h/w errors or redirect into a file and post it for review:

# echo "sel dev all;info;wait;il"|/usr/sbin/cstm

hth,
jasonK_1
Frequent Advisor

Re: Event Monitor Notification - only a vague idea

cnb,

see the attachment for the result of the command:
echo "sel dev all;info;wait;il"|/usr/sbin/cstm
cnb
Honored Contributor

Re: Event Monitor Notification - only a vague idea

Jason,

Well the good news is 12H appears to be healthy! ;-)

The not-so-great news is that the tool is reporting issues when trying to obtain status with several PCI adapters. This message appears throughout the report:

"An error was encountered when attempting to obtain the information.
Check the tool activity log for more details."


You'll have to look at the activity log to see where the issue might be. The log can viewed with:

# cstm
cstm> ial

Check the time stamps in the activity log for today's date to see what it indicates when the tool was run earlier today. It may point to why the tool was unable to retrieve all of the hardware information.

hth,