Re: Hardware traps seen in syslog

kaushikbr · ‎11-29-2006

Hi,

We have a Intel rx2620 based system running HP-UX B11.23. This morning we found that the telnet sessions to this machine was hanging for no apparent reason. We tried to connect from the ILO port, still we were not able to login. As a final resort we rebooted the box. After the reboot looking in the syslog file I found the following messages.

Nov 30 09:14:25 nuncas04 EMS [2655]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/ad
apters/events/raid_adapter/0_4_1_0_4_0" (Threshold: >= " 3") Execute the following command to obtain event d
etails: /opt/resmon/bin/resdata -R 173998082 -r /adapters/events/raid_adapter/0_4_1_0_4_0 -n 173998081 -a

This happens on a couple of other boxes as well not very often but sometimes and the only way to get around this problem is to reboot this box. I dont understand what the problem is. Is this a known problem ? Does this have any fix ?
I have copied the output of the resdata results.

Thanks in advance for all your suggestions.
Regards
Kaushik

root $ /opt/resmon/bin/resdata -R 173998082 -r /adapters/events/raid_adapter/0_4_1_0_4_0 -n 173998081 -a

CURRENT MONITOR DATA:

Event Time..........: Thu Nov 30 09:14:25 2006
Severity............: CRITICAL
Monitor.............: dm_raid_adapter
Event #.............: 2
System..............: nuncas04

Summary:
Adapter at hardware path 0/4/1/0/4/0 : CISS: RAID SA controller is now
online.

Description of Error:

lbolt value: 1725

CISS: RAID SA controller is now on-line.

Probable Cause / Recommended Action:

No Action required. Information message only.

Additional Event Data:
System IP Address...: 16.26.84.87
Event Id............: 0x456ea0f100000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_raid_adapter.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
0x456ea04600000000
Additional System Data:
System Model Number.............: ia64 hp server rx2620
OS Version......................: B.11.23
EMS Version.....................: A.04.10
STM Version.....................: C.46.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_raid_adapter.htm#2

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v

I/O Log Event Data:

Driver Status Code..................: 0x00000002
Length of Logged Hardware Status....: 0 bytes.
Offset to Logged Manager Information: 0 bytes.
Length of Logged Manager Information: 12 bytes.

Manager-Specific Information:

Raw data from the SCSI RAID SA Controller CISS driver:
00000001 000006BD 00000000

Prashanth.D.S · ‎11-29-2006

Hi Kaushik,

This is not a error message, its a informative message and can be ignored. This error message can be viewed after every reboot. Though the issue is informative you can see the severity as CRITICAL.

Fix is made available in Dec of 2005 release of onlinediag product
located on OE media for HP-UX 11.23 release

Actual issue could be something else, would suggest you to register case with HP.

Best Regards,
Prashanth

Peter Godron · ‎11-29-2006

Hi,
can you confirm that this controller controls the disk which holds either /, /usr or /home ?
When your controller does not work, your disks do not work and no login is possible.
The message just says the controller is on-line, but not why it went offline!
Are the problems always with the same controller/disks? If so I would replace it.

kaushikbr · ‎11-29-2006

Hi Prasanth, Peter Thanks for your replies.

This controller is managing the MSA30 connected to the m/c. And the disks in the MSA30 are used only for the applications ( Oracle & TeMIP ).
All the / /usr directories are on disks driven by a different controller. The user home directories are NFS mounted.

However we have seen this happening on a couple of our machines even in the past.

Thanks and regards
Kaushik

nuncas04:root $ ioscan -funC disk
Class I H/W Path Driver S/W State H/W Type Description
============================================================================
disk 0 0/0/2/0.0.0.0 sdisk CLAIMED DEVICE HL-DT-STDVD+RW GCA-4040N
/dev/dsk/c0t0d0 /dev/rdsk/c0t0d0
disk 1 0/1/1/0.0.0 sdisk CLAIMED DEVICE HP 73.4GST373454LC
/dev/dsk/c2t0d0 /dev/dsk/c2t0d0s3 /dev/rdsk/c2t0d0s2
/dev/dsk/c2t0d0s1 /dev/rdsk/c2t0d0 /dev/rdsk/c2t0d0s3
/dev/dsk/c2t0d0s2 /dev/rdsk/c2t0d0s1
disk 6 0/1/1/1.2.0 sdisk CLAIMED DEVICE HP 73.4GST373454LC
/dev/dsk/c3t2d0 /dev/dsk/c3t2d0s3 /dev/rdsk/c3t2d0s2
/dev/dsk/c3t2d0s1 /dev/rdsk/c3t2d0 /dev/rdsk/c3t2d0s3
/dev/dsk/c3t2d0s2 /dev/rdsk/c3t2d0s1
disk 3 0/4/1/0/4/0.0.0 sdisk CLAIMED DEVICE HP LOGICAL VOLUME
/dev/dsk/c4t0d0 /dev/rdsk/c4t0d0
disk 4 0/4/1/0/4/0.0.1 sdisk CLAIMED DEVICE HP LOGICAL VOLUME
/dev/dsk/c4t0d1 /dev/rdsk/c4t0d1
disk 5 0/4/1/0/4/0.0.2 sdisk CLAIMED DEVICE HP LOGICAL VOLUME
/dev/dsk/c4t0d2 /dev/insf137 /dev/rdsk/c4t0d2

Andrew Merritt_2 · ‎11-30-2006

Hi Kaushik,
I think that RAID event is a red-herring. As Prashant said, the severity level is incorrect in the version of OnlineDiags that you have and is corrected in the current, supported versions to be Informational. I would recommend upgrading the OnlineDiags, though I don't think that will address the underlying problem.

It sounds more likely to be a network related issue, or maybe the whole system locking up.

Also, when are those events being logged, before or after the system is rebooted? What was the last thing logged in syslog before the reboot?

Is anything working on the system when the telnet sessions stop? Is it just network connections failing, or is the whole system hung? Are there any I/O errors being logged (see logtool in STM)? What about the lights on the front panel, do they indicate any errors?

Andrew

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Hardware traps seen in syslog

Hardware traps seen in syslog