Operating System - HP-UX
1822861 Members
3931 Online
109645 Solutions
New Discussion юеВ

Re: logical volume write errors

 
SOLVED
Go to solution
Shaamil
Frequent Advisor

logical volume write errors

Hi Gurus

I was wondering if anyone can help me.

I keep on getting inode errors in my syslog file and the aplication keeps on failing with i/o errors.

Jan 31 11:17:06 pnp100 vmunix: vxfs: mesg 056: vx_dataioerr - /dev/vg04/acct file system file data write error
Jan 31 11:17:06 pnp100 vmunix: vxfs: mesg 018: vx_idelxwri_done - /usr/acct file system inode 36143 had a write error at offset 2179072
Jan 31 11:17:08 pnp100 vmunix: vxfs: mesg 033: vx_check_badblock - /dev/vg04/acct file system had an I/O error, setting VX_FULLFSCK
Jan 31 11:18:24 pnp100 vmunix: vxfs: mesg 018: vx_idelxwri_done - /usr/acct file system inode 36143 had a write error at offset 2220032

The above output is from syslog.
I then stop my application, umount the filesystem and the fsck it.
# fsck -F vxfs -m -o full /dev/vg01/racct
vxfs fsck: sanity check: /dev/vg04/racct needs checking

I the fsck it withou the -m....
# fsck -F vxfs -o full /dev/vg04/racct
log replay in progress
pass0 - checking structural files
pass1 - checking inode sanity and blocks
pass2 - checking directory linkage
pass3 - checking reference counts
pass4 - checking resource maps
OK to clear log? (ynq)y
set state to CLEAN? (ynq)y

I then do the fsck again with the -m and ....
fsck -F vxfs -m -o full /dev/vg01/racct
vxfs fsck: sanity check: /dev/vg04/racct needs checking
Does anyone know why this is...
lvdisplay -v /dev/vg04/acct shows the following:
HP is the greatest
15 REPLIES 15
Jannik
Honored Contributor
Solution

Re: logical volume write errors

From this link:
http://docs.hp.com/en/B3929-90011/apas03.html

WARNING: msgcnt x: vxfs: mesg 033: vx_check_badblock - mount_point filesystem had an I/O error, setting VX_FULLFSCK

Level: Message 033

Explanation
When the disk driver encounters an I/O error, it sets a flag in the super-block structure. If the flag is set, the kernel will set the VX_FULLFSCK flag as a precautionary measure. Since no other error has set the VX_FULLFSCK flag, the failure probably occurred on a data block.

Action
Unmount the file system and use fsck to run a full structural check. Check the console log for I/O errors. If the problem is a disk failure, replace the disk before the file system is mounted for write access.

----

So instead of doing the fsck try to do the lvdisplay to see if you have stale inodes. It seems that your disk is failing but not completly.
Another thing is it could be a patch issue what patch and OS are you running?
jaton
Denver Osborn
Honored Contributor

Re: logical volume write errors

bad or missing disk?

Please post the output of vgdisplay for vg04.

-denver
Shaamil
Frequent Advisor

Re: logical volume write errors

Hi

The out put of lvdisplay is attached on the 1st question. I am running hp-ux11.00 on a K570. Hp did give arrange some patches for me, but this was about a year ago.

I want to know what the ??? is in my lvdisplay. Does this mean that it cannot find the disk or what?

Thanks in advance
Shaamil
HP is the greatest
Shaamil
Frequent Advisor

Re: logical volume write errors

It seems as there is a disk missing.
The vgdisplay shows me a disk, which it could'nt query as well as lvdisplay.
the disk is
/dev/dsk/c6t13d0
I cannot even see this disk with ioscan.

My problem is now... I do not have this filesystem mirrored. What do I do to correct the issue.
Tks,
Shaamil
HP is the greatest
Henk Geurts
Esteemed Contributor

Re: logical volume write errors

hi Shaamil

LVM thinks there must be another disk ..

check the output of
strings /etc/lvmtab
for another disk in vg04.

use
ioscan -fnC disk
to check for NO_HW (no hardware).

post the output please .

regards
Henk
bhavin asokan
Honored Contributor

Re: logical volume write errors

hi,

in ioscan what is the status of that disk.is it claimed.
if it is claimed

try
vgcfgrestore /etc/lvmconf/vg04.conf /dev/rdsk/c6t13d0


regds,
Shaamil
Frequent Advisor

Re: logical volume write errors

Hi

vgdisplay is attaced.
pvdisplay shows the following..
pvdisplay -v /dev/dsk/c6t13d0
pvdisplay: Warning: couldn't query physical volume "/dev/dsk/c6t13d0":
The specified path does not correspond to physical volume attached to
this volume group
pvdisplay: Warning: couldn't query all of the physical volumes.
pvdisplay: Couldn't retrieve the names of the physical volumes
belonging to volume group "/dev/vg04".
pvdisplay: Cannot display physical volume "/dev/dsk/c6t13d0".

lv display is way above.. the disk error is also there, but only on standard out.

ioscan -fnCdisk does not show the disk, nor does it show NO_HW. Remember I have rebooted the server allready.
Tks,
Shaamil
HP is the greatest
Denver Osborn
Honored Contributor

Re: logical volume write errors

If part of the filesystem was on /dev/dsk/c6t13d0 and this disk has failed, and there isn't a mirror, then you've lost data.

Once you replace the failed disk, you'll have to newfs the filesystem and restore from your last known good backup. There's no way around it, if the disk is gone and no mirror... data is lost.

-denver
bhavin asokan
Honored Contributor

Re: logical volume write errors

hi,

when you rebooted the server ,it recreated the device files.if the disk is not showing there,now only way is to find and replace the disk.

do a vgcfgrestore

vgcfgrestore -n /dev/vg04 /dev/rdsk/c6t13d0

restore the data from backup.

regds,
Denver Osborn
Honored Contributor

Re: logical volume write errors

'lssf /dev/dsk/c6t13d0' should show the "???" for the hardware path. Meaning the disk is no longer seen at that path. Rebooting the box did nothing more that remove it from ioscan because the box didn't see the scsi disk when it booted up.

I'm assuming this failed disk is an HASS (jamaica) enclosure? Can you give more detail about where the disk is at? Do you have a replacement for it or?? If you have hp h/w support for the box/disk, call in for assistance and get the failed drive replaced.

-denver
Shaamil
Frequent Advisor

Re: logical volume write errors

Hi all
I have reseated my disk, rebooted the server and it came up....Hoooraaa

Thanks for all your vaulble input. it is greatly appreciated. I must say the response on these forums is the best on the plannet.

Cheers all.
HP is the greatest
Gerhard Roets
Esteemed Contributor

Re: logical volume write errors

Hi Shaamil

Just so you know, it might be degenerative disc failure this implies that the disc might fail again.

I would run mstm against that physical disk if I were you.

HTH
Gerhard
D Block 2
Respected Contributor

Re: logical volume write errors

I notice BAD-Block replacement is ON, and also notice no mirror of vg04 for "acct"..

in the disk h/w there is a pool of available blocks that are reserved for BAD block re-vectoring.. if this pool runs out, you are trouble..

I would start thinking of disk replacement asap: Ignite the system to tape, if you have not do so yet. Try to set up disk mirroring, if no done yet. do you have alternate boot device ?
Golf is a Good Walk Spoiled, Mark Twain.
Shaamil
Frequent Advisor

Re: logical volume write errors

Hi Guys
I did the following:

I removed the disk. I do not have a replacement. I exported the vg and imported it without specifying that disk.
Now I now I will lose data, but that is fine.
I can restore what I need, but I want to try something. My LV /dev/vg04/acct looks like this....
pnp100:#/> lvdisplay -v /dev/vg04/acct | more
--- Logical volumes ---
LV Name /dev/vg04/acct
VG Name /dev/vg04
LV Permission read/write
LV Status available/syncd
Mirror copies 0
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 40000
Current LE 5000
Allocated PE 5000
Stripes 0
Stripe Size (Kbytes) 0
Bad block on
Allocation strict
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c4t1d0 1084 1084
/dev/dsk/c4t3d0 1084 1084
/dev/dsk/c4t4d0 392 392
/dev/dsk/c6t10d0 692 692
/dev/dsk/c6t11d0 967 967
/dev/dsk/c5t2d0 3 3

--- Logical extents ---
LE PV1 PE1 Status 1
0000 /dev/dsk/c6t10d0 0392 current
0001 /dev/dsk/c6t10d0 0393 current

Futher down it looks like this....

4215 /dev/dsk/c4t4d0 0388 current
4216 /dev/dsk/c4t4d0 0389 current
4217 /dev/dsk/c4t4d0 0390 current
4218 /dev/dsk/c4t4d0 0391 current
4219 /dev/dsk/c5t2d0 0122 current
4220 /dev/dsk/c5t2d0 0123 current
4221 /dev/dsk/c5t2d0 0124 current
4222 ??? 0000 current
4223 ??? 0001 current
4224 ??? 0002 current
4225 ??? 0003 current
4226 ??? 0004 current
4227 ??? 0005 current
4228 ??? 0006 current

Is there a way get rid of all the LE's from 4222 down.
Please help, suggest, anything....

HP is the greatest
Gerhard Roets
Esteemed Contributor

Re: logical volume write errors

Hi Shaamil

In your situation of not having a replacement disk. I would personally recreate the whole vg.

Fixing it would be a "big hack" and you have already influenced a file system negatively. Rebuilding it would be the safer route to go, you do not know what other problems might crop up down the road if you maybe "hack" it a bit to much.

But just so that you know ... the symptoms you are seeing there is typical of vxfs metadata corruption. There can be a few causes.

1. Disk gone over to the dark side for some or other reason ( Your case )
2. JFS version mismatch ... moving disks from a JFS 3.5 based system down to a JFS 3.3 based syste.
3. Bad FSCK strategies, fsck can break a disks metadata if you do not follow the rules.
4. ... some other I have personally not seen yet. I am sure a lot of people would be able to add extra reasons.

HTH
Gerhard