Operating System - HP-UX
1833471 Members
2694 Online
110052 Solutions
New Discussion

Early indication of PV failure?

 
SOLVED
Go to solution
Carl Houseman
Super Advisor

Early indication of PV failure?

I've got a total of 16 (5 actual, plus 'above message repeats x times' worth 11) of the following over a 30 minute period, during a backup:

Aug 31 20:27:14 hpn vmunix: LVM: Failed to automatically resync PV 1f29b000 error: 5

These are the only errors. I checked (lvdisplay -v) all of the filesystems on the volume group containing c41t11d0 and all PE statuses were "current".

May I assume if there was a total failure to sync there would be other messages, and that this is an indication that retries are happening, and eventually, succeeding?

thanks...
8 REPLIES 8
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Early indication of PV failure?

Error 5 indicates an i/o error and that translates to a hardware failure of some kind -- most likely a failing disk. You have correcttly decoded the device number so I would get ready to replace this disk. Surely all your LVOL's are mirrored so this will be no big deal and you can do it online with your users no knowing anything has happened. You are mirrored, aren't you?
Since all the LE's are current then you are ok for now but I would definitely get the drive replaced because the system has given your fair warning.

If it ain't broke, I can fix that.
Kevin Wright
Honored Contributor

Re: Early indication of PV failure?

Yep.. better order a disk, probably only a matter of time before it completely fails.
Carl Houseman
Super Advisor

Re: Early indication of PV failure?

OK, the disk finally did go bad. This is a data disk, one of a mirrored pair in a VG. I would like to remove it from the VG and add an online spare to the VG to mirror the good drive without taking the system down.

Assuming that removing the disk online is feasible, with 4 LV's in the VG, would the following be correct?

lvreduce -m 0 /dev/vg111/lv01 /dev/dsk/c41t11d0
lvreduce -m 0 /dev/vg111/lv02 /dev/dsk/c41t11d0
lvreduce -m 0 /dev/vg111/lv03 /dev/dsk/c41t11d0
lvreduce -m 0 /dev/vg111/lv04 /dev/dsk/c41t11d0
vgreduce /dev/vg111 /dev/dsk/c41t11d0

thanks...
James R. Ferguson
Acclaimed Contributor

Re: Early indication of PV failure?

Hi Carl:

Yes, that is the correct first step -- reduce the mirrors and eliminate the failing disk from the volume group.

Regards!

...JRF...
A. Clay Stephenson
Acclaimed Contributor

Re: Early indication of PV failure?

No, you do not want to vgreduce.

1) Slide the failed disk out of it's hot-swap slot a few centimeters and let if spin down. Wait about 45-60 seconds -- now your disk is truly failed.

2) Remove the failed disk completely from the slot and insert the new disk. Wait 45-60 seconds.

3) vgcfgrestore -n /dev/vg111 /dev/rdsk/c4t11d0

4) vgchange -a y /dev/vg111

5) vgsync /dev/vg111 # this could take a few tens of minutes

6) vgdisplay -v /dev/vg111 and make certain each LVOL is available/syncd.

7) Declare victory.

-----------------------------------------

Now get a baseball bat and whack yourself -- hard -- because you should have replaced the drive before this. If I were in charge of the universe, I would have caused your other drive to go bad while you were twiddling your thumbs.


If it ain't broke, I can fix that.
Carl Houseman
Super Advisor

Re: Early indication of PV failure?

My thumb twiddling was calculated. I really don't need to replace the drive that badly at all - the system was discontinued from regular production use as of Friday Sep 1. There's just a handful of users on it now.

So my part II question is more of a training exercise for a hypothetical situation. e.g. Suppose I don't have physical access, but I have a spare drive online, can I effect the mirror replacement with the spare from another slot? I would think so, but I've never needed to in the whole 8 months I've been doing HPUX.
A. Clay Stephenson
Acclaimed Contributor

Re: Early indication of PV failure?

That's a bit more difficult to answer; depending upon the OS Version and LVM patches installed, lvreduce to reduce mirroring and vgreduce will sometimes hang. For that reason, I always convert a failing drive to a failed drive by the simple expedient of sliding it out a bit and waiting a few tens of seconds. I would be more inclined to vgextend and lvextend to mirror the new drive; however, I would much rather go onsite (or have Mr. HP Goodwrench go onsite) and physically swap the drive. The concept of an online spare is rather foreign to LVM. That belongs to true arrays which are able to bring the online spare "online" automatically. Your scenario is more akin to a JBOD (just a bunch of disks). For your kind of scenario, I instead keep onhand spares in stock and go upstairs and swap the drives -- and then call HP to replace my spare drive.

If it ain't broke, I can fix that.
Carl Houseman
Super Advisor

Re: Early indication of PV failure?

Correct, these are JBODs mirrored with LVM.

My lvreduce / vgreduce took a long time and I needed to vgreduce the alternate path device for the same PV but when all was said and done I got it to completely forget that c41t11d0 and c43t11d0 were ever part of that VG.

And then adding c41t14d0 to the VG and lvextending -m 1 was no sweat.

Much appreciated, everyone... points assigned, signing off.