Operating System - HP-UX
1753885 Members
7095 Online
108809 Solutions
New Discussion юеВ

Probable disk failure.. need to confirm...

 
Sunny Jaisinghani
Trusted Contributor

Probable disk failure.. need to confirm...

Hello All,

I have a disk which seems to be failing on HPUX 11.11

Following are the messages i got from EMS.

Disk at hardware path 0/1/1/0.0.0 : I/O request failed.
Disk at hardware path 0/1/1/0.1.0 : Software configuration error
Disk at hardware path 0/1/1/0.0.0 : A SMART event has occurred.
Disk at hardware path 0/1/1/0.0.0 : Software configuration error

Tests i did

# ioscan -fnH 0/1/1/0.0.0
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
disk 1 0/1/1/0.0.0 sdisk CLAIMED DEVICE HP 73.4GST373454LC
/dev/dsk/c2t0d0 /dev/rdsk/c2t0d0
# diskinfo /dev/rdsk/c2t0d0
SCSI describe of /dev/rdsk/c2t0d0:
vendor: HP 73.4G
product id: ST373454LC
type: direct access
size: 71687369 Kbytes
bytes per sector: 512


# pvdisplay /dev/dsk/c2t0d0
--- Physical volumes ---
PV Name /dev/dsk/c2t0d0
VG Name /dev/vg00
PV Status available
Allocatable yes
VGDA 2
Cur LV 9
PE Size (Mbytes) 16
Total PE 4374
Free PE 0
Allocated PE 4374
Stale PE 0
IO Timeout (Seconds) default
Autoswitch On


MSTM

Hardware path: 0/1/1/0.0.0

Product Id: ST373454LC Vendor: HP 73.4G
Device Type: SCSI Disk Firmware Rev: HPC3
Device Qualifier: HP73.4GST373454LC Logical Unit: 0
Serial Number: 3KP1Z4KM00007621M7F3
Capacity (M Byte): 70007.20
Block Size: 512
Max Block Address: 143374737
Error Logs
Total Retries: 0 Buffer Overruns: N/A
Read Reverse Errors: N/A Buffer Underruns: N/A
Write Errors: 0 Non-Medium Errors: 12
Verify Errors: 0


SYSLOG

Nov 5 18:32:42 eca1ap21 vmunix: SCSI: Request Timeout; Abort Tag -- lbolt: 612603113, dev: 1f020000, io_id: 2aed574
Nov 5 18:32:52 eca1ap21 vmunix: SCSI: Request Timeout; Abort Tag -- lbolt: 612603313, dev: 1f020000, io_id: 2aed61c
Nov 5 18:32:52 eca1ap21 vmunix: SCSI: Request Timeout; Abort Tag -- lbolt: 612603413, dev: 1f020000, io_id: 2aed61e
Nov 5 18:32:52 eca1ap21 vmunix: SCSI: Request Timeout; Abort Tag -- lbolt: 612603413, dev: 1f020000, io_id: 2aed5d3
Nov 5 18:32:52 eca1ap21 vmunix: SCSI: isrEscape Controller at 0/1/1/0.
Nov 5 18:32:52 eca1ap21 vmunix: SCSI: First party detected bus hang (HTH) -- lbolt: 612603734, dev: 1f020000
Nov 5 18:32:52 eca1ap21 vmunix: SCSI: Resetting SCSI -- lbolt: 612603834, bus: 2 path: 0/1/1/0
Nov 5 18:32:52 eca1ap21 vmunix: SCSI: Reset detected -- lbolt: 612603834, bus: 2 path: 0/1/1/0
Nov 5 18:32:52 eca1ap21 vmunix: SCSI: Read error -- dev: b 31 0x020000, errno: 126, resid: 1024,

# cd dsk
# ll | grep 020000
brw-r----- 1 bin sys 31 0x020000 Feb 6 2007 c2t0d0


# dd if=/dev/rdsk/c2t0d0 of=/dev/null bs=1024k count=64
64+0 records in
64+0 records out


All tests show that the SCSI device is OK except syslog.

Should i plan disk replacement.??

Thanks for your suggestions

Sunny
11 REPLIES 11
R.K. #
Honored Contributor

Re: Probable disk failure.. need to confirm...

Hi Sunny,

Yes..syslog says lbolt error possibly hardware.

We can try running dd on the full disk and see if we get any error.
Don't fix what ain't broke
Sunny Jaisinghani
Trusted Contributor

Re: Probable disk failure.. need to confirm...

i will gradually increase dd on the disk and check if i get any IO errors
Sunny Jaisinghani
Trusted Contributor

Re: Probable disk failure.. need to confirm...

# dd if=/dev/rdsk/c2t0d0 of=/dev/null bs=1024k
70007+1 records in
70007+1 records out



Some more logs i found

Nov 5 18:32:52 eca1ap21 vmunix: LVM: VG 64 0x000000: PVLink 31 0x020000 Failed! The PV is not accessible.
Nov 5 18:32:52 eca1ap21 vmunix:
Nov 5 18:32:57 eca1ap21 above message repeats 2 times
Nov 5 18:32:57 eca1ap21 vmunix: LVM: VG 64 0x000000: PVLink 31 0x020000 Recovered.


Any suggestions?

Sunny
R.K. #
Honored Contributor

Re: Probable disk failure.. need to confirm...

Hi..

What is the value for IO timeout?
#pvdisplay /dev/dsk/cxtydz | grep -i io

default normally refers to 30 secs.

It can be changed online.
pvchange -t 120 /dev/dsk/cxtydz

This sets the time for IO timeout attempts from LVM in which if a response is not recieved from the PV or the Path of the PV , that will be marked as failed and if that is a path failure and Alternate Paths are configured then the Alternate path will be used for IO.
This is avoid syslog error for "pvlink failed"

Any stale extents on the disk?
Don't fix what ain't broke
Sunny Jaisinghani
Trusted Contributor

Re: Probable disk failure.. need to confirm...

IO Timeout for this PV is default i.e. 30 sec.

There are no Stale PEs on the disk.
Sharma Sanjeev
Respected Contributor

Re: Probable disk failure.. need to confirm...

Hi Sunny

Your Above tests shows that Disk is fine.

So you can just chage PV time out & can be done online

pvchange -t 180 /dev/dsk/c2t0d0

Regards
Sanjeev
Everything is Possible as " IMPOSSIBLE" word itself says I M POSSIBLE
Sunny Jaisinghani
Trusted Contributor

Re: Probable disk failure.. need to confirm...

Hello All,

Can we conclude the disk is OK.

Thanks
Sharma Sanjeev
Respected Contributor

Re: Probable disk failure.. need to confirm...

Yes
Everything is Possible as " IMPOSSIBLE" word itself says I M POSSIBLE
Johnson Punniyalingam
Honored Contributor

Re: Probable disk failure.. need to confirm...

>>Can we conclude the disk is OK.<<<<

if you can ioscan , diskinfo and dd a drive normally than there is no hardware problem

You can also check or "refer" PDF / Document

"Good_Disk_Gone_Bad"
Problems are common to all, but attitude makes the difference