Operating System - HP-UX
1748194 Members
4025 Online
108759 Solutions
New Discussion юеВ

Re: Replace root mirror disk,are these steps all right?

 
SOLVED
Go to solution
hailerer
Advisor

Replace root mirror disk,are these steps all right?

The engineer did following steps by online operation,after done we found many system file were damaged!If the steps are wrong,pls point out mistake so that they will be corrected.
1)first replace the bad disk online with hot pulling and pluging action.
2)ioscan -fnCdisk
3├п┬╝ mv /etc/lvmtab /etc/lvmtab.bak
4├п┬╝ pvcreate /dev/rdsk/c2t#d0
5├п┬╝ cp /etc/lvmtab.bak /etc/lvmtab
6├п┬╝ strings /etc/lvmtab
7├п┬╝ mkboot /dev/rdsk/c2t#d0
8├п┬╝ mkboot -a "hpux -lq (;0) /stand/vmunix" /dev/rdsk/c2t10d0
9├п┬╝ vgcfgrestore -n /dev/vg00 /dev/rdsk/c2t10d0
10├п┬╝ vgsync /dev/vg00
11├п┬╝ lvlnboot -r /dev/vg00/lvol1
12├п┬╝
10 REPLIES 10
hailerer
Advisor

Re: Replace root mirror disk,are these steps all right?

The engineer did following steps by online operation,after done we found many system file were damaged!If the steps are wrong,pls point out mistake so that they will be corrected.
1)first replace the bad disk online with hot pulling and pluging action.
2)ioscan -fnCdisk
3)mv /etc/lvmtab /etc/lvmtab.bak
4)pvcreate /dev/rdsk/c2t#d0
5)cp /etc/lvmtab.bak /etc/lvmtab
6)strings /etc/lvmtab
7)mkboot /dev/rdsk/c2t#d0
8)
mkboot -a "hpux -lq (;0) /stand/vmunix" /dev/rdsk/c2t#d0
9)vgcfgrestore -n /dev/vg00 /dev/rdsk/c2t10d0
10)vgsync /dev/vg00
11)lvlnboot -r /dev/vg00/lvol1
12)lvlnboot -s /dev/vg00/lvol2

Denver Osborn
Honored Contributor

Re: Replace root mirror disk,are these steps all right?

Here's my 2 cents;

>1)first replace the bad disk online with >hot pulling and pluging action.

Before pulling out the "bad" disk, make sure it's dead and no IO could be going to it. Easy way is to test with dd, or diskinfo. On a completely dead disk, diskinfo would show the size as 0 bytes.


>2)ioscan -fnCdisk
>3)mv /etc/lvmtab /etc/lvmtab.bak
>4)pvcreate /dev/rdsk/c2t#d0
>5)cp /etc/lvmtab.bak /etc/lvmtab
>6)strings /etc/lvmtab

Not sure why you'd need to do all of this. after replacing it I'd probably take a look at ioscan to confirm we see the new disk. but moving the lvmtab and a pvcreate aren't making any sense.

>7)mkboot /dev/rdsk/c2t#d0
>8) mkboot -a "hpux -lq(;0) /stand/vmunix" /dev/rdsk/c2t#d0

Fine, if this was a bootable mirror for root vg.

>9)vgcfgrestore ->n /dev/vg00 /dev/rdsk/c2t10d0
>10)vgsync /dev/vg00
>11)lvlnboot -r /dev/vg00/lvol1
>12)lvlnboot -s /dev/vg00/lvol2

This is ok, after the disk is replaced you don't need to pvcreate it. Simply vgcfgrestore to put the lvm data back on the disk, then vgchange -a y, then you could check run vgsync if mirror shows stale extents.

The only thing that would worry me is pulling out a live disk that could possibly have IO. If the disk was not completely dead, you'd reduce the mirrors then replace disk.

There's also a good doc on the itrc re: lvm process for replacing failed lvm disk. I'm too lazy to look up the doc right now :) but search for "replace failed lvm mirror" and you should see come across it.

Hope this helps,
-denver
Michael Tully
Honored Contributor

Re: Replace root mirror disk,are these steps all right?

Steven E. Protter
Exalted Contributor

Re: Replace root mirror disk,are these steps all right?

I have a slightly more complete procedure for manually rebuilding the mirror.

You should not need it, but its good to have.

pvcreate -B /dev/rdsk/c1t0d0 #use real disk

mkboot -l /dev/rdsk/c1t0d0
mkboot -a "hpux -lq (;0)/stand/vmunix" /dev/rdsk/c1t0d0 # use real disk


# mkboot -b /usr/sbin/diag/lif/updatediaglif -p ISL -p AUTO -p HPUX -p PAD -p LABEL /dev/rdsk/c?t?d?

If you are running 64-bit OS:

# mkboot -b /usr/sbin/diag/lif/updatediaglif2 -p ISL -p AUTO -p HPUX -p PAD -p LABEL /dev/rdsk/c?t?d?


vgextend /dev/vg00 /dev/dsk/c1t0d0 # same thing
lvextend -m 1 /dev/vg00/lvol1 /dev/dsk/c1t0d0

# real disk. repeat for other lvols

lvlnboot -r /dev/vg00/lvol3 # root fs /
lvlnboot -s /dev/vg00/lvol2 #swap
lvlnboot -d /dev/vg00/lvol2 #swap/dump
lvlnboot -b /dev/vg00/lvol1
lvlnboot -R
lvlnboot -v
setboot
setboot -a 52.1.0 # second disk

Thanks to all that made this doc right.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Phil Smith
Occasional Advisor

Re: Replace root mirror disk,are these steps all right?

Something I think a lot of people overlook is that in the event of lossing one of your mirrored OS disks it you still want the server to be able to boot unattended from the remaining disk you need to specify -lq (quorum override) as with one disk gone you don't have a quorum.

Note this should be set for BOTH the primary and secondary disk.
Sridhar Bhaskarla
Honored Contributor

Re: Replace root mirror disk,are these steps all right?

Hi,

YOu got the mirroring steps. Make sure you follow those links/steps to mirror your disks correctly. Do not reboot your system until "lvlnboot" is fixed correctly.

lvlnboot -r /dev/vg00/lvol1 is a blunder unless you really made your root logical volume as lvol1. Usually it is stand and he should be giving 'lvlnboot -b /dev/vg00/lvol1', 'lvlnboot -s /dev/vg00/lvol2', 'lvlnboot -r /dev/vg00/lvol3' and 'lvlnboot -d /dev/vg00/lvol2'.

You didn't have to move lvmtab file.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Kyri Pilavakis
Frequent Advisor

Re: Replace root mirror disk,are these steps all right?

I have been looking after an HP-UX systems and I have a document which I have set up. This has been very useful to me when I had various disk failures.

Sending this to you and I hope it will also be of some use. It contains other stuff which might also be of use. Unfortunately not to me any more as our company scraped all unix systems and moved to NT and VMS !!

Please dont take it as bible...I am only human !!

kyris

Bosses don't undestand..HP does
James Lynch
Valued Contributor

Re: Replace root mirror disk,are these steps all right?

I agree with Denver, you should never hot-plug a disk that is not completely dead. If there is any possibility that the suspected failed disk can handle an I/O request, then Murphy will strike and you will end up with filesystem corruption.

Hot-plug does not necessarily equate with hot-swap. Hot-plug means that the hardware device can support removal and insertion of the device with out having to remove power from that device, in this case a disk and it's enclosure. Hot-swap refers more to the OS's ability to support device removal and insertion of a device while I/Os are still active. HP's LVM does not support this hot swap capability and was never designed with this capability in mind.

If you are going to attempt to hot-plug a disk, then you must make sure that all I/Os to the lvols on that disk have stopped. There are three ways to do that, none of which were the method your engineer used:

1) Shutdown the system and relpace the disk.
or
2) Unmount all filesystems that are contained in the lvols on the affected disk. Also stop any application that is using the lvols in RAW mode, i.e. Database.
or
3) lvreduce the failed disk's mirrors and then vgreduce the failed disk from the VG. Depending upon the version of the OS and how the disk has failed, this may or may not be possible.

I have successfully used all three of the mthods above. I also have tried testing replacing a disk with the method that you used, and I was able to introduce corruption in my mirrored filesystems.

Thinking that you can replace LVM mirrored disk while the filesystems are mounted and the failed disk is still active in the volume group seems to be a common misperception. I have run into several people have tried to do this and ended up with corrpted filesystems.

JL
Wild turkey surprise? I love wild turkey surprise!
A. Clay Stephenson
Acclaimed Contributor

Re: Replace root mirror disk,are these steps all right?

I think the fundamental problem was that no vgchange was done. The pvcreate was unnecessary. I have replaced at least a hundred "flakey" boot drives without a problem by yanking them out with a measure of caution. Those who advocate shutting down to replace hot-plug drives should consider this. When is a drive most likely to fail? I know of at least two occasions when the remaining drive failed to become active after a shutdown --- leaving a dead box. I also fail to see how a box can distinguish between a yanked-out drive and a truly dead one. I trust tri-state electronics and have no fear of pulling a drive whether completely dead or not BUT one must take a few precautions. It is possible to be in a state where stale extents are present on both mirrors --- that is where the corruption ultimately comes from.

1) Do a series of lvdisplay -v commands for each lvol on the physical disk and make sure that all the extents on the remaining "good" drive are current. (I actually have a script for this --- checkextents.sh)

2) "Yank" the bad or flakey drive out. Pull it out just 2 cm or so and leave it resting in the slot. This lets the drive spin down gradually and who knows you might need it again.

3) Wait 30 seconds or so and then run checkextents.sh agian making sure that the remaining drive has all "current" extents.

4) Now remove the bad drive completely from the slot and insert the replacement drive. Allow it to spin up. Think happy thoughts.

Let's assume the replacement drive is c0t6d0:
5) vgcfgrestore -n /dev/vg00 /dev/rdsk/c0t6d0
6) vgchange -a y /dev/vg00
7) mkboot /dev/rdsk/c0t6d0
8) mkboot -a "hpux -lq (;0)/stand/vmunix" /dev/rdsk/c0t6d0
9) lvlnboot -R
10) vgsync /dev/vg00 --- this could take a few tens on minutes

11) Do lvdisplays to make certain that all extents are current.

I have yet to have the above procedure fail but maybe I've just been lucky.

If it ain't broke, I can fix that.