vxfs mount: is corrupted. needs checking

Hpmyworld123 · ‎04-27-2013

Hi Gurus,

Please Help me to resolve below issue.

HP-UX B.11.00 U 9000/800 628309303

mount -a

vxfs mount: /dev/datavg1/lvusr40 is corrupted. needs checking
vxfs mount: /dev/datavg1/lvusr38 is corrupted. needs checking
vxfs mount: /dev/datavg1/lvusr37 is corrupted. needs checking
vxfs mount: /dev/datavg1/lvusr36 is corrupted. needs checking
vxfs mount: /dev/datavg1/lvusr35 is corrupted. needs checking
vxfs mount: /dev/datavg1/lvusr34 is corrupted. needs checking
vxfs mount: /dev/datavg1/lvusr33 is corrupted. needs checking
vxfs mount: /dev/datavg1/lvusr32 is corrupted. needs checking
vxfs mount: /dev/datavg1/lvusr31 is corrupted. needs checking

Same error for all other above logical vilumes.

# fsck -F vxfs -o full /dev/datavg1/lvusr40
vxfs fsck: cannot read OLT extent 1
read of OLT copy failed
missing or invalid OLT copies, rewrite? (ynq)y
vxfs fsck: cannot write secondary OLT data
file system check failure, aborting ...

Matti_Kurkela · ‎04-27-2013

> HP-UX B.11.00

Looks like it's going to be old hardware. Possibly hardware that is about to fail because of age, or already failed.

What is the output of "model" and "ioscan -fnk"?

What is the output of "strings /etc/lvmtab"?

Does "dmesg" output contain anything that looks like SCSI errors? Any "lbolt" messages?

The most important question will most likely be: do you have backups of this data?

If not, roughly estimate how much the data is worth to you, and/or how much time and effort would be needed to re-create the data from original sources.

If it's worth tens of thousands of dollars or more, STOP doing things on your own and contact data recovery professionals. (Trying things on your own may make their job harder or even impossible.)

What type of disk(s)? Internal SCSI, external SCSI, FibreChannel, or something else? If external, what is the type of the enclosure/storage system?

Check for cable/connector issues.
If there are any other disks using the same controller/HBA that are working OK, then the controller/HBA is probably fine. If all the disks on this controller are suddenly failing, the controller/HBA might be faulty: in that case, stop trying to access the disks with a possibly faulty controller, as it might damage your data still further.

MK

Hpmyworld123 · ‎04-27-2013

HI MK,

Thanks for you reply,

Below are the details

9000/800/N4000-55

ioscan -fnk (attached)

Currently we are facing problem with volume group “datavg1”

“datavg1” volume group is created using 10 Physical volumes and contains 9 logical volumes.

Logical volumes of “datavg1” :-

/dev/datavg1/lvusr40

/dev/datavg1/lvusr38

/dev/datavg1/lvusr37

/dev/datavg1/lvusr36

/dev/datavg1/lvusr35

/dev/datavg1/lvusr34

/dev/datavg1/lvusr33

/dev/datavg1/lvusr32

/dev/datavg1/lvusr31

Physical Volumes Details :-

--- Physical volume groups ---

PVG Name PV1

PV Name /dev/dsk/c10t0d0

PV Name /dev/dsk/c10t1d0

PV Name /dev/dsk/c10t2d0

PV Name /dev/dsk/c10t3d0

PV Name /dev/dsk/c10t4d0

PVG Name PV2

PV Name /dev/dsk/c10t5d0

PV Name /dev/dsk/c10t6d0

PV Name /dev/dsk/c10t7d0

PV Name /dev/dsk/c10t8d0

PV Name /dev/dsk/c10t9d0

Among 10 Physical volumes, one physical volume is missing (the highlighted one).

# ioscan -fnC disk | grep /dev/dsk/c10t1d0

#

Is there any way to set it right ? please help me!!

Matti_Kurkela · ‎04-27-2013

Hmm, your ioscan shows that the c10tXdY disks all have FibreChannel-style disk paths, and they are attached to a Tachyon XL2 HBA. Yet, the ioscan shows the model information of each individual disk, and the FC paths are all Private Loop style (.../0.8.0.255...). It seems these disks are probably in a "dumb" external FC disk shelf.

When I Googled the disk models to check their type, I found a recent thread in another forum that seems to have exactly the same disk set-up as yours:

http://www.unix.com/hp-ux/222303-vxfs-mount-corrupted-needs-checking.html

That thread includes a "vgdisplay -v datavg1" output. That would have been/is very useful in understanding the problem. I'm going to assume it is yours; please post a "vgdisplay -v datavg1" output if that is not correct.

Your c10tXdY disks all have an alternate path, c11tXdY respectively. That's a good practice with FibreChannel, but unfortunately it does not help here.

All your LVs have their Allocated PE as (2 * Current LE). That means they are mirrored, and that explains why there are Physical Volume Groups set up: each LV will have one copy of each extent in the first PVG, and another copy on the second PVG.

The "vgdisplay -v" output in the other forum contains error messages on four disk paths:

c10t1d0, which seems to be completely gone.
c11t1d0, which is the alternate path of c10t1d0, so it's no surprise that it is gone too.
c10t6d0, which still appears in the ioscan listing
and c11t6d0, which is the alternate path of c11t6d0.

Given your PVG configuration, and assuming your VG is laid out in the most straightforward way, then the disk c10t6d0 is likely to contain a mirror copy of the data on c10t1d0. The fact that c10t6d0 is also producing error messages is disturbing...

And to make matters worse, all your LVs seem to be using all the PVs: the Used PV value on each PV is 8 + there are 4 (2 PVs * 2 paths per PV = 4) " couldn't query physical volume" at each LV information segment in the "vgdisplay -v" output. This looks like extent-based striping. Each LV mirror half is striped across all the PVs in a PVG, so the failure of one PV is going to make the mirror on all LVs to go stale... and if a pair of disks fail, then each LV is going to have holes in its data.

The disk c10t1d0 has obviously failed completely. As it is not appearing in the ioscan listing at all, not even as a NO_HW entry, this disk has probably already been dead when the system was rebooted the last time.

Please run "diskinfo -v /dev/rdsk/c10t6d0". If it indicates the disk size as 0, this disk has also definitely failed, although not in the same way as c10t1d0. Even if it reports the correct size, it may still be failing.

Both c10t1d0 and c10t6d0 will need to be replaced, although

You can try unplugging the c10t1d0 and/or c10t6d0 disks from the disk shelf and plugging them back in. Do not move them to different slots.

If you are very lucky, one or both of them might restart and successfully complete their internal diagnostics this time. Run "ioscan -fnCdisk" and "diskinfo -v /dev/rdsk/c10t6d0" (and "diskinfo -v /dev/rdsk/c10t1d0" if it reappears) to see if the disks restarted successfully. If one or both did, run "vgchange -ay vgdata1", mount the LVs in read-only mode, and try and take a backup ASAP: those disks may fail again at any moment, and you may not be as lucky twice.

If you have older backups of this VG, don't overwrite them: this backup may include some corrupted files, and you may need to recover them from older backups later.

All in all, this looks like a typical "unmonitored mirror set" catastrophe: the failure of the first disk in a mirror set (probably c10t1d0 here) has gone unnoticed, and the problem was noticed only when its mirror pair (c10t6d0) started to fail. The fact that your mirror set seems to include striping will actually make the damage worse: instead of only losing the LVs located on the failed pair of disks, each LV striped across the set of PVs will have parts missing. If those missing parts happen to include critical filesystem metadata, the LVs may be unrecoverable.

Lessons for the future:

Whenever you have a mirrored disk (either software or hardware mirror) on a server, make sure it has some way to alert someone if one half of the mirror fails, and make sure the person or people receiving the alert will know what to do when an alert is received. If such an alert mechanism is not available and you cannot script one, make it a part of daily routine to check the health of the mirror. Otherwise the failure of one mirror component may go unnoticed, and when the second half fails, it may be too late.
A stripe set may act as a damage amplifier in disk failures. Make extra sure stripe sets are regularly backed up.
Always make sure you have proper backups.
A mirrored disk is not a backup: it is just protection against the failure of an individual disk, and that protection is only good if the failed disk is replaced promptly, so that the mirror set can be recovered before the other disk of the mirror pair fails.

MK

Hpmyworld123 · ‎04-27-2013

Hi MK,

Thanks a Ton for a wonderful knowledge share.
Yes, ur right the other post was by me!!
We are planning to chanage one of disk •c10t1d0 using followin g stepes

please correct me if i am wrong

#vgdisplay /dev/datavg1
#ioscan -CF disk
#vgcfgrestore -n /dev/datavg1 /dev/rdsk/c10t1d0
#vgchange -a y /dev/datavg1

It will be greatful , if u advise of mirrioning part.

Once again, Thanks a ton. MK

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

vxfs mount: is corrupted. needs checking

vxfs mount: is corrupted. needs checking

Re: vxfs mount: is corrupted. needs checking

Re: vxfs mount: is corrupted. needs checking

Re: vxfs mount: is corrupted. needs checking

Re: vxfs mount: is corrupted. needs checking