1834162 Members
2775 Online
110064 Solutions
New Discussion

Re: disk failure

 
SOLVED
Go to solution
Brian Lee_4
Regular Advisor

disk failure

I got the error below from EMS.
I check the disk with "diskinfo" and "dd"
command but I cann't find any problem.

stad01:/#diskinfo /dev/rdsk/c12t6d0
SCSI describe of /dev/rdsk/c12t6d0:
vendor: HP 73.4G
product id: ST373405FC
type: direct access
size: 71687369 Kbytes
bytes per sector: 512

I create some files in the disk and there was no problem.

I got the following message in syslog.log file.

vxfs: mesg 056: vx_dataioerr - /dev/vg03/lvol24 file system file data read error

The file system is part of the defected disk.

Can I consider this disk as a defected one despite that I can read files and write dat in the disk ?

===============================================
CURRENT MONITOR DATA:

Event Time..........: Fri Nov 14 01:52:07 2003
Severity............: CRITICAL
Monitor.............: disk_em
Event #.............: 100237
System..............: stad01

Summary:
Disk at hardware path 1/8/0/0.8.0.255.0.6.0 : Media failure


Description of Error:

The device was unsuccessful in reading or writing data for the current I/O
request due to an error on the medium. The data could not be recovered.

Probable Cause / Recommended Action:

Reformatting the medium may fix the problem.

Alternatively, the medium in the device is flawed. If the medium is
removable, replace the medium with a fresh one.

Alternatively, if the medium is not removable, the device has experienced
a hardware failure. Contact your HP support representative to have the
device checked.

Additional Event Data:
System IP Address...: 105.1.11.151
Event Id............: 0x3fb47b9700000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_disk_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
0x3fb47b9600000000
Additional System Data:
System Model Number.............: 9000/800/N4000-55
OS Version......................: B.11.00
STM Version.....................: A.38.00
EMS Version.....................: A.03.20
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/scsi.htm#100237

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v



Component Data:
Physical Device Path...: 1/8/0/0.8.0.255.0.6.0
Device Class...........: Disk
Inquiry Vendor ID......: HP 73.4G
Inquiry Product ID.....: ST373405FC
Firmware Version.......: HP09
Serial Number..........: 3EK01S4Q

Product/Device Identification Information:

Logger ID.........: sdisk
Product Identifier: SCSI Disk
Product Qualifier.: HP73.4GST373405FC
SCSI Target ID....: 0x06
SCSI LUN..........: 0x00

I/O Log Event Data:

Driver Status Code..................: 0x0000007C
Length of Logged Hardware Status....: 22 bytes.
Offset to Logged Manager Information: 24 bytes.
Length of Logged Manager Information: 34 bytes.

Hardware Status:

Raw H/W Status:
0x0000: 00 00 00 02 F0 00 03 02 06 82 A7 0A 00 00 00 00
0x0010: 11 00 E4 80 00 86

SCSI Status...: CHECK CONDITION (0x02)
Indicates that a contingent allegiance condition has occurred. Any
error, exception, or abnormal condition that causes sense data to be
set will produce the CHECK CONDITION status.

SCSI Sense Data:

Undecoded Sense Data:
0x0000: F0 00 03 02 06 82 A7 0A 00 00 00 00 11 00 E4 80
0x0010: 00 86

SCSI Sense Data Fields:
Error Code : 0x70
Segment Number : 0x00
Bit Fields:
Filemark : 0
End-of-Medium : 0
Incorrect Length Indicator : 0
Sense Key : 0x03
Information Field Valid : TRUE
Information Field : 0x020682A7
Additional Sense Length : 10
Command Specific : 0x00000000
Additional Sense Code : 0x11
Additional Sense Qualifier : 0x00
Field Replaceable Unit : 0xE4
Sense Key Specific Data Valid : TRUE
Sense Key Specific Data : 0x80 0x00 0x86

Sense Key 0x03, MEDIUM ERROR, indicates that the command terminated
with a nonrecovered error condition that was probably caused by a
flaw in the medium or an error in the recorded data. This sense key
may also be returned if the device is unable to distinguish between a
flaw in the medium and a specific hardware failure (sense key 0x04).
For the RECOVERED ERROR, HARDWARE ERROR, or MEDIUM ERROR Sense Key,
the Sense Key Specific data indicates that 134 retries were
attempted.

The combination of Additional Sense Code and Sense Qualifier (0x1100)
indicates: Unrecovered read error.

SCSI Command Data Block:

Command Data Block Contents:
0x0000: 28 00 02 06 82 80 00 00 80 00

Command Data Block Fields (10-byte fmt):
Command Operation Code...(0x28)..: READ
Logical Unit Number..............: 0
DPO Bit..........................: 0
FUA Bit..........................: 0
Relative Address Bit.............: 0
Logical Block Address............: 33981056 (0x02068280)
Transfer Length..................: 128 (0x0080)

Manager-Specific Data Fields:
Request ID.............: 0x0C5A8A58
Data Residue...........: 0x0000B200
CDB status.............: 0x00000002
Sense Status...........: 0x00000000
Bus ID.................: 0x0C
Target ID..............: 0x06
LUN ID.................: 0x00
Sense Data Length......: 0x12
Q Tag..................: 0xE0
Retry Count............: 45
brian lee
6 REPLIES 6
Geoff Wild
Honored Contributor

Re: disk failure

It could mean that the disk is failing....you might want to place a HP Support call....

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Helen French
Honored Contributor

Re: disk failure

Couple of things to try:

1) Check the file systems which are residing on the specific disk with fsck:

# fsck -F vxfs -o full /dev/vg03/lvol24

2) Check the disk with STM tools:

# cstm or stm

3) if you find any error, you can either recreate LVM defenitions (newfs on file systems) or replace the disk. if you donot have hardware error on disk, a newfs command will remove all file system errors.
Life is a promise, fulfill it!
Helen French
Honored Contributor

Re: disk failure

Brian Lee_4
Regular Advisor

Re: disk failure

When I run dd command (dd if=/dev/vg03/lvol24 of=/dev/null bs=64k),
I get I/O error.
I realized that there was a problem on the disk or file system when I ran a daily backup.
There was an error saying that it could't read one file in the /dev/vg03/lvol24 file system.
I try to copy the file to another directory but I cann't.
Should I conclude this as the file is corrupted ?
brian lee
Helen French
Honored Contributor
Solution

Re: disk failure

The error from 'dd' command normally means you 've got issues with the disk. It's normally a hardware error, so the best option is to replace the disk and restore data from backup.
Life is a promise, fulfill it!
James Lynch
Valued Contributor

Re: disk failure

Brian,

The messages you are seeing indicate that your disk has a bad block on it. This bad block has resulted in file system corruption. It is safe to conclude that the file on /dev/vg03/lvol24 is corrupt and unusable. But remember, there could be more corruption that is not showing up just yet.

dd is a good tool to check if you disks/lvols have i/o errors, but it is not always 100% accurate. Sometimes dd could run successfully without errors, but that does not mean that your disk is good. UNIX disk drivers and the disk mechanisms use retry counters when attempting any i/o to or from the disk. An i/o request failure does not get report to EMS until both the disk and disk driver retry counters have been exceeded. In order to make sure that i/o's are completed successfully, these retry counters are needed, this is due in part to the somewhat inconsistent nature of magnetic media. A block on the disk could be marginal in the sense that it may fail an i/o request today, and then complete successfully tomorrow.

A good rule to remember is, if you have hard errors being reported by EMS/syslog on a device, that device needs to be serviced.

JL
Wild turkey surprise? I love wild turkey surprise!