cancel
Showing results for 
Search instead for 
Did you mean: 

syslog error

Rommel T. Misa_2
Frequent Advisor

syslog error

I just browsed my syslog and saw the following message:
Aug 23 23:13:55 SAPHRHPDB01 EMS [3197]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/1_0_0_3_0.6.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 209518594 -r /storage/events/disks/default/1_0_0_3_0.6.0 -n 209518593 -a

Has anyone encountered this?
11 REPLIES
Jeeshan
Honored Contributor

Re: syslog error

hmmmm, seems to related disks.apply the following command and send the output
#/opt/resmon/bin/resdata -R 209518594 -r /storage/events/disks/default/1_0_0_3_0.6.0 -n 209518593 -a
a warrior never quits
Sandeep_Chaudhary
Trusted Contributor

Re: syslog error

what is output of opt/resmon/bin/resdata -R 209518594 -r /storage/events/disks/default/1_0_0_3_0.6.0 -n 209518593 -a
Rommel T. Misa_2
Frequent Advisor

Re: syslog error

There was no output when I issued the command.
Torsten.
Acclaimed Contributor

Re: syslog error

Unlikely. Did you use the complete command?

If this is a midrange server, it is most likely a problem with an internal disk.

You should have a look into roots mailbox. You will find the complete message there.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Rommel T. Misa_2
Frequent Advisor

Re: syslog error

ran the command again and got the following output:

CURRENT MONITOR DATA:



Event Time..........: Sat Aug 23 23:13:55 2008

Severity............: CRITICAL

Monitor.............: disk_em

Event #.............: 100237

System..............: SAPHRHPDB01



Summary:

Disk at hardware path 1/0/0/3/0.6.0 : Media error





Description of Error:



The device was unsuccessful in reading or writing data for the current I/O

request due to an error on the medium. The maximum number of retries were

attempted and the data could not be read.



Probable Cause / Recommended Action:



If the event is reported against a device other than a disk drive:



- Reformatting the medium may fix the problem.

- Alternatively, the medium in the device is flawed.

- If the medium is removable, replace the medium with a fresh one.

- Alternatively, if the medium is not removable, the device has

experienced a hardware failure. Repair or replace the device, as

necessary.



If the event is reported against a disk drive on a system on which none

or only some of the disks are in a redundant environment (i.e., mirrored):



- Review applications for errors at the time the event was reported to

determine which data could not be read.

- Attempt to re-read the data.

- Re-write the data to the disk to allow the disk to reallocate to a

spare area on the disk.

- If a re-read of the data and/or a rewrite of the data are not

successful, the disk should be replaced and data restored from backup.



If the event is reported against a disk drive on a system on which all

disks are in a redundant environment (i.e., mirrored):



- When the OS is patched to current LVM and SCSI patches, reallocation

will take place automatically for these disks, and no action needs be

taken to check or replace these drives.

- To avoid unnecessary paging and notification, the severity of this

event can be changed to MINOR_WARNING by enabling the alternate

configuration for this event in

/var/stm/config/tools/monitor/default_disk_em.clcfg (and

/var/stm/config/tools/monitor/rst_disk_em.clcfg, if it exists):



- Find the following lines:

EQ:100237:CRITICAL:...



and insert a "#" in column 1.



- Remove the "#" from column 1 of the line which starts:

EQ:100237:MINOR_WARNING:...





Additional Event Data:

System IP Address...: 10.30.18.135

System IP Address...: 192.168.1.2

Event Id............: 0x48b0293300000000

Monitor Version.....: B.01.01

Event Class.........: I/O

Client Configuration File...........:

/var/stm/config/tools/monitor/default_disk_em.clcfg

Client Configuration File Version...: A.01.00

Qualification criteria met.

Number of events..: 1

Associated OS error log entry id(s):

0x48b027cd00000000

Additional System Data:

System Model Number.............: ia64 hp server rx7640

OS Version......................: B.11.23

STM Version.....................: C.60.00

EMS Version.....................: A.04.20

Latest information on this event:

http://docs.hp.com/hpux/content/hardware/ems/scsi.htm#100237



v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v







Component Data:

Physical Device Path...: 1/0/0/3/0.6.0

Device Class...........: Disk

Inquiry Vendor ID......: HP 146 G

Inquiry Product ID.....: ST3146855LC

Firmware Version.......: HPC8

Serial Number..........: 3LN53S9C00009842RH6M



Product/Device Identification Information:



Logger ID.........: sdisk

Product Identifier: SCSI Disk

Product Qualifier.: HP146

SCSI Target ID....: 0x06

SCSI LUN..........: 0x00



I/O Log Event Data:



Driver Status Code..................: 0x0000007C

Length of Logged Hardware Status....: 22 bytes.

Offset to Logged Manager Information: 24 bytes.

Length of Logged Manager Information: 34 bytes.



Hardware Status:



Raw H/W Status:

0x0000: 00 00 00 02 F0 00 03 0F 4F B3 53 0A 00 00 00 00

0x0010: 11 00 81 80 00 8F



SCSI Status...: CHECK CONDITION (0x02)

Indicates that a contingent allegiance condition has occurred. Any

error, exception, or abnormal condition that causes sense data to be

set will produce the CHECK CONDITION status.



SCSI Sense Data:



Undecoded Sense Data:

0x0000: F0 00 03 0F 4F B3 53 0A 00 00 00 00 11 00 81 80

0x0010: 00 8F



SCSI Sense Data Fields:

Error Code : 0x70

Segment Number : 0x00

Bit Fields:

Filemark : 0

End-of-Medium : 0

Incorrect Length Indicator : 0

Sense Key : 0x03

Information Field Valid : TRUE

Information Field : 0x0F4FB353

Additional Sense Length : 10

Command Specific : 0x00000000

Additional Sense Code : 0x11

Additional Sense Qualifier : 0x00

Field Replaceable Unit : 0x81

Sense Key Specific Data Valid : TRUE

Sense Key Specific Data : 0x80 0x00 0x8F



Sense Key 0x03, MEDIUM ERROR, indicates that the command terminated

with a nonrecovered error condition that was probably caused by a

flaw in the medium or an error in the recorded data. This sense key

may also be returned if the device is unable to distinguish between a

flaw in the medium and a specific hardware failure (sense key 0x04).

For the RECOVERED ERROR, HARDWARE ERROR, or MEDIUM ERROR Sense Key,

the Sense Key Specific data indicates that 143 retries were

attempted.



The combination of Additional Sense Code and Sense Qualifier (0x1100)

indicates: Unrecovered read error.



SCSI Command Data Block:



Command Data Block Contents:

0x0000: 28 00 0F 4F B2 E2 00 02 00 00



Command Data Block Fields (10-byte fmt):

Command Operation Code...(0x28)..: READ

Logical Unit Number..............: 0

DPO Bit..........................: 0

FUA Bit..........................: 0

Relative Address Bit.............: 0

Logical Block Address............: 256881378 (0x0F4FB2E2)

Transfer Length..................: 512 (0x0200)



Manager-Specific Data Fields:

Request ID.............: 0x02000BEF

Data Residue...........: 0x00031E00

CDB status.............: 0x00000002

Sense Status...........: 0x00000000

Bus ID.................: 0x02

Target ID..............: 0x06

LUN ID.................: 0x00

Sense Data Length......: 0x12

Q Tag..................: 0xFA

Retry Count............: 8

Hope you can help... Thanks
Prashanth.D.S
Honored Contributor

Re: syslog error

Hmmm seems like the lun/disk at hardware path 1/0/0/3/0.6.0 has disk issue.... run the following command and check if there are any read or write errors, if so replace the disk.

#cstm
cstm>map

Check the device number for disk 1/0/0/3/0.6.0 (on the left corner)

cstm>sel dev # (replace # with the device number for 1/0/0/3/0.6.0 )

cstm>info
cstm>il

Now check for any read or write errors on this disk.

If you have a mirror copy for the above disk i suggest u follow the same procedure to check.

Best Regards,
Prashanth
Rommel T. Misa_2
Frequent Advisor

Re: syslog error

Thanks. Do I have to do this offline or can I just run it while my system and its applications are running?

ROMMEL
Prashanth.D.S
Honored Contributor

Re: syslog error

Hi,

You may run this when the applications are up it doesnt have any affect on the application.

Regards,
Prashanth
Sandeep_Chaudhary
Trusted Contributor

Re: syslog error

This is clearclut disk problem. Check yhe disk at the address "1/0/0/3/0.6.0"
by using "ioscan funH 1/0/0/3/0.6.0"

get it replace.
Rommel T. Misa_2
Frequent Advisor

Re: syslog error

ran cstm on the disks in question but there were no errors found. could this be a problem with ems?
Peter Nikitka
Honored Contributor

Re: syslog error

Hi,

I don't know, if a non-destructive test via cstm really tries to access all disk blocks.
Check, if a command like
dd if=/dev/YOURDISK of=/dev/null bs=1024k

completes without errors.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"