Operating System - HP-UX
1832904 Members
2361 Online
110048 Solutions
New Discussion

IO errors during dd of LV, but no PV IO errors

 
David Child_1
Honored Contributor

IO errors during dd of LV, but no PV IO errors

I am getting the following:

dd if=/dev/vgora_data/rlvora_data05 of=/dev/null bs=4096k
192+0 records in
192+0 records out

So it bombs out at extent 192.

I then ran 'dd' against the PV that this LV resides on and I was able to successfully read the entire PV without any IO errors.

I have EMC looking at the array (DMX1000) for anything. Perhaps something having to do with Bad Block Relocation. (This LV was set with '-r N' as per EMCs latest recommendations).

Any ideas what else to look at or sugguestions for recovery?

Environment info:
- nPar on SD32000 (HP-UX 11.11)
- attached to storage via SAN
- EMC DMX1000 (32Gb Meta device)
- Custom patch bundle current as of April

Thanks,
David
5 REPLIES 5
Simon Hargrave
Honored Contributor

Re: IO errors during dd of LV, but no PV IO errors

That doesn't look like an error to me, are you sure it isn't copying the whole file?

You say it's stopping at extend 192, however you're copying blocks of 4Mb (4096kb), which clearly isn't your oracle block size. I suspect you have a 4k oracle block size, and you're actually copying 196608 extents...

What is the return code from the dd command? (echo $? after the dd finishes).
David Child_1
Honored Contributor

Re: IO errors during dd of LV, but no PV IO errors

Simon,

Sorry, I forgot one line of the output. It should have been:
# dd if=/dev/vgora_data/rlvora_data05 of=/dev/null bs=4096k
dd read error: I/O error
192+0 records in
192+0 records out

# echo $?
2


By extents I of course mean PV extents which are 4Mb (4096k). This is a 4Gb LV so I'm not getting through much of it. I originally ran the 'dd' using a block size of 4k to match Oracle's block size, but that takes a long time so I did not rerun at that block size. I have since tried 64k as well. The error is the same only its now:

12292+0 records in
12292+0 records out

12292*64=786688

This is about the same as where its stopping using the 4096k block size (192*4096=786432).

Thanks,
David
Simon Hargrave
Honored Contributor

Re: IO errors during dd of LV, but no PV IO errors

Ah, makes sense now!

"-r N" is fine, and should be used on arrays like EMC/XP etc, since the array has it's own bad block allocation.

Does EMS show up anything? (check syslog for EMS errors). Have you tried exercising the disk in stm? Presumably you've checked dmesg etc for errors.

Seems odd that it works from the disk, but not the LV though? Perhaps there is an lvm patch somewhere that you don't have installed?
A. Clay Stephenson
Acclaimed Contributor

Re: IO errors during dd of LV, but no PV IO errors

From the perspective of HP-UX; this is a raw LVOL that has bad PE's (e.g. bad disk blocks). This really should never happen on a high-end array and is the reason for the -rN lvcreate flag. Bad blocks should be handled and corrected by the array long before the OS even has a whiff of the problem. At this point, we can't quite distinguish between 2 conditions: 1) a problem with the array itself, and 2) a problem with the LVM data structures. It is just possible that the LVM data structures responsible for translating logical to physical extents are corrupt so that at this roughly 800MB offset into the LVOL, the disk request is directed to a bogus offset or device. Your next task is to do a "vgdisplay -v /dev/vgxx" to get the physical disks which comprise the VG and do a dd (read-only!) on them. There is no point in exceeding bs=1024k because any larger read requests get broken into smaller physical reads. If the dd's of /dev/rdsk/cXtYdZ are okay then the problem is with the LVM structures; if these dd's are bad then you have an array hardware/software problem.
If it ain't broke, I can fix that.
David Child_1
Honored Contributor

Re: IO errors during dd of LV, but no PV IO errors

Thanks for the replies.

As I had stated in my original post, I was able to run full 'dd' on the PVs that this LV exists on. I went back and ran another 'dd' on each drive in the volume group.

Example of one drive:
# dd if=/dev/rdsk/c3t2d7 of=/dev/null bs=1024k
34526+1 records in
34526+1 records out

# echo $?
0

Note: These are EMC meta devices and 'diskinfo' shows them to be 35354880 Kb (34526.25 Mb) in size.

So it appears that there may be a problem with the LVM structures. Any sugguestions on recovering from this?

Thanks,
David