System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Event Monitor Notification - only a vague idea

 
jasonK_1
Frequent Advisor

Event Monitor Notification - only a vague idea

I got the following email notification and don't know what cause it. It's the 12H autoraid


Notification Time: Fri May 29 15:23:24 2009

hnrgnw sent Event Monitor notification information:

/storage/events/disk_arrays/AutoRAID/0000003FEB3E
is >= 3.
Its current value is SERIOUS(4).



Event data from monitor:

Event Time..........: Fri May 29 15:23:24 2009
Severity............: SERIOUS
Monitor.............: armmon
Event #.............: 101
System..............: hnrgnw

Summary:
Disk Array at hardware path : Device removed from monitoring


Description of Error:

The device has been removed from the list of devices being monitored by
this monitor.

Probable Cause / Recommended Action:

The device was removed from the system, has stopped responding to the
system or it has been replaced with a device that is not supported by this
monitor.
Run ioscan to determine the state and type of the device.
Check the /var/stm/data/os_decode_xref for the information indicating
which devices are supported by this monitor.
Check other monitors to determine if they are now monitoring the
device by running /etc/opt/resmon/lbin monconfig and the using the
"Check monitoring" command.

Additional Event Data:
System IP Address...: 205.135.30.74
Event Id............: 0x4a20605c00000000
Monitor Version.....: B.01.01
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_armmon.clcfg
Client Configuration File Version...: A.01.01
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/L2000-44
EMS Version.....................: A.03.20
STM Version.....................: A.44.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/armmon.htm#101

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v

15 REPLIES
cnb
Honored Contributor

Re: Event Monitor Notification - only a vague idea

Hi Jason,

Follow the link and you get:

Event 101

Severity: Critical
Event Summary: Disk at hardware path x/xx/x.x.x : Device removed from monitoring
Event Class: I/O
Problem Description:
The device has been removed from the list of devices being monitored by this monitor.
Probable Cause / Recommended Action
The device was removed from the system, has stopped responding to the system or it has been replaced with a device that is not supported by this monitor.

***Run ioscan to determine the state and type of the device.
*** Check the /var/stm/data/os_decode_xref for the information indicating which devices are supported by this monitor.
*** Check other monitors to determine if they are now monitoring the device by running /etc/opt/resmon/lbin/monconfig and using the "Check monitoring" command.
Automated Recovery: None
Event Generation Threshold: 1 occurrence


Any UNCLAIMED or NO_HW changes noticed from ioscan?

Might want to check the STM log for other issues also.

Could be a number of things. Start with the array subsystem hardware and all system logs.

Warning on the 12H?

hth,
Basheer_2
Trusted Contributor

Re: Event Monitor Notification - only a vague idea

look in /var/adm/syslog/syslog.log
adn you will find a command that you can run to get this information.

Since this is serious and >3 urgent action is needed.

Go to your SAN and look for RED LIGHTS.

Depending on your SAN ( VA,EVA,XP)
you can hot replace a disk.
OldSchool
Honored Contributor

Re: Event Monitor Notification - only a vague idea

I believe you'll find a dead disk in the AutoRaid.


regarding:
"Go to your SAN and look for RED LIGHTS.
Depending on your SAN ( VA,EVA,XP)
you can hot replace a disk."

On AutoRaid, I don't think you'll find "red lights". you should be able to look at the status via sam. Also the Admin guide indicates commands to list array logs, which may show something of interest.


Admin Guide here:
http://h10032.www1.hp.com/ctg/Manual/lpg28365.pdf

Other resources at:
http://h10025.www1.hp.com/ewfrf/wc/manualCategory?dlc=en&lc=en&product=61936&cc=us&

Drive(s) should be hot-swappable, so once you locate the culprit and obtain spares, it should be an easy fix.
jasonK_1
Frequent Advisor

Re: Event Monitor Notification - only a vague idea

Hi All,

1. The system is still up and running
2. no unclaimed HW
3. no missing disk
4. no warning message when run arraydsp -a AUTORAID1
5. no further notification email since May 29

Should I let it go or just keep an eye on it?

cnb
Honored Contributor

Re: Event Monitor Notification - only a vague idea

I would at least check/restart monitoring the array again, as this message indicates that it is no longer being monitored.

Was this the only message in the event log?

hth,
jasonK_1
Frequent Advisor

Re: Event Monitor Notification - only a vague idea

cnb,

The 12H autoraid is still being monitored
cnb
Honored Contributor

Re: Event Monitor Notification - only a vague idea

Hi Jason,

You might want to run this and check the results for any h/w errors or redirect into a file and post it for review:

# echo "sel dev all;info;wait;il"|/usr/sbin/cstm

hth,
jasonK_1
Frequent Advisor

Re: Event Monitor Notification - only a vague idea

cnb,

see the attachment for the result of the command:
echo "sel dev all;info;wait;il"|/usr/sbin/cstm
cnb
Honored Contributor

Re: Event Monitor Notification - only a vague idea

Jason,

Well the good news is 12H appears to be healthy! ;-)

The not-so-great news is that the tool is reporting issues when trying to obtain status with several PCI adapters. This message appears throughout the report:

"An error was encountered when attempting to obtain the information.
Check the tool activity log for more details."


You'll have to look at the activity log to see where the issue might be. The log can viewed with:

# cstm
cstm> ial

Check the time stamps in the activity log for today's date to see what it indicates when the tool was run earlier today. It may point to why the tool was unable to retrieve all of the hardware information.

hth,
cnb
Honored Contributor

Re: Event Monitor Notification - only a vague idea

Jason,

Sorry I forgot to include to check the system activity log and not just the infolog activity:

# cstm
cstm> ls

Rgds,
jasonK_1
Frequent Advisor

Re: Event Monitor Notification - only a vague idea

cnb,

cstm>ial
^-- (InfoActLog) is currently disabled. --

cstm>ls
Thu Feb 19 17:27:42 2009: Aborting daemon process (cclogd) with process
identifier (1541) as support tool system is shutting
down.

Thu Feb 19 17:27:42 2009: Aborting daemon process (diaglogd) with process
identifier (1542) as support tool system is shutting
down.

Thu Feb 19 17:27:42 2009: Aborting daemon process (memlogd) with process
identifier (1543) as support tool system is shutting
down.

Thu Feb 19 17:27:42 2009: Aborting daemon process (psmctd) with process
identifier (1544) as support tool system is shutting
down.

Thu Feb 19 17:35:58 2009: Diagmond daemon started.

Thu Feb 19 17:37:08 2009: Launching daemon executable
(/usr/sbin/stm/uut/bin/sys/cclogd).

Thu Feb 19 17:37:08 2009: Launching daemon executable
(/usr/sbin/stm/uut/bin/sys/diaglogd).

Thu Feb 19 17:37:08 2009: Launching daemon executable
(/usr/sbin/stm/uut/bin/sys/memlogd).

Thu Feb 19 17:37:08 2009: Launching daemon executable
(/usr/sbin/stm/uut/bin/sys/psmctd).

Fri Feb 20 09:35:11 2009: An IPC message of 0 length was read.

Possible Causes/Recommended Action:

The sender of the IPC message connected but did
not send any data or exited before sending any
data.
Check the support tool system activity log and
the tool activity logs for more information on
the sending process.

Fri Feb 20 09:35:11 2009: Attempt to read the IPC header from a port failed.
The port is on system name (hnrgnw) at IP address
(205.135.30.74) and has system port (1508).

Fri Feb 20 09:35:11 2009: Message received from system port had an invalid
header. Message will be ignored.

Fri Feb 20 10:25:44 2009: An IPC message of 0 length was read.

Possible Causes/Recommended Action:

The sender of the IPC message connected but did
not send any data or exited before sending any
data.
Check the support tool system activity log and
the tool activity logs for more information on
the sending process.

Fri Feb 20 10:25:44 2009: Attempt to read the IPC header from a port failed.
The port is on system name (hnrgnw) at IP address
(205.135.30.74) and has system port (1508).

Fri Feb 20 10:25:44 2009: Message received from system port had an invalid
header. Message will be ignored.
cnb
Honored Contributor

Re: Event Monitor Notification - only a vague idea

Jason,

Try this:

echo "sel dev all;info;wait;ial"|/usr/sbin/cstm

-or-
Try the report again manually

#cstm
cstm> sel all
cstm> info
cstm> il
quit
done
cstm> ial
quit
save
file-name
cstm> quit



hth,
jasonK_1
Frequent Advisor

Re: Event Monitor Notification - only a vague idea

cnb,

OK, I got something out of that command. Please see the attachment.
cnb
Honored Contributor

Re: Event Monitor Notification - only a vague idea

Jason,

A few things are not right and they may or may not be related to the system firmware being way out of date. You system is at 41.39 and critcial fixes were published after that version. The latest is 44.28 for your system.

For 11.00:

http://www13.itrc.hp.com/service/patch/patchDetail.do?BC=main|patchDetail{PHSS_31800,{hpux:11.00,}}|&patchid=PHSS_32704&sel={hpux:11.00,}

a) According to the diag tool, one of your processors isn't responding properly. This is interesting because part of log report has both processors active and configured. So I'm not sure why it's failing to report properly, PDC?

+++++
from infolog report for proc at 166:

Processor PIM Information: PIM contains NO data

+++++

-- Information Tool Activity Log for CPU on path 166 --

Log creation time:
Tue Jun 2 14:44:53 2009

Tue Jun 2 14:44:53 2009: Information tool (cpu) starting on path (166).

Tue Jun 2 14:44:53 2009: Failure Summary:

The returned status from PDC_SYSTEM_INFO call:
(-4).

Possible Causes/Recommended Action:

The PDC_SYSTEMINFO call returned status indicating
that selected processor is not present.

This is an unexpected status. No recommended
action available.

+++++

However this issue was supposedly fixed with PDC 41.39:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=120&prodSeriesId=377380&prodTypeId=15351&objectID=c00906010

I have a feeling it didn't and you may want to have HP update all of your system firmware to the latest supported versions.


b) For some reason the /dev/diag/diagx files are either missing or the tool can't read them. You'll need to resolve this or recreate the files with 'insf -e'. See insf man pages:

Tue Jun 2 14:44:52 2009

Tue Jun 2 14:44:52 2009: Information tool (pci) starting on path (0/7/0/0).

Tue Jun 2 14:44:52 2009: The device file /dev/diag/diag1 is not present on the
system. Communications with diag1 diagnostic pseudo
driver is not possible without this file.

Possible Causes/Recommended Action:

Please use "insf -e " from the /dev directory to
create the /dev/diag/diag1 device file.

Tue Jun 2 14:44:52 2009: Failed to obtain device_id, revision_id,
device_status, vendor_id and class_code for the card.

An error or warning occurred while obtaining the
information.


You might want to check /etc/rc.log and verify everything started up correctly the last time it booted. If you find errors then investigate.

IMHO I would get the system firmware updated to match the installed software requirements first. SInce the diag tool isn't reporting the adapter type I can't tell what h/w is supported for your config. You can check here for some adapter minimum PDC versions (although 11.00 isn't listed you can get an idea of where your system should be at):

http://docs.hp.com/en/SFWM1/ch04.html#ftn.dc3

The diagnostic subsystem is not functioning properly and you might not have any accurate warnings until something really breaks.


hth,


BTW:
http://forums13.itrc.hp.com/service/forums/helptips.do?#33

;-)

jasonK_1
Frequent Advisor

Re: Event Monitor Notification - only a vague idea

cnb,

I'll do what you said

Thanks