Operating System - HP-UX
1753774 Members
7185 Online
108799 Solutions
New Discussion юеВ

Fixing a changed (bad) volume group

 
SOLVED
Go to solution
Brian Bergstrand
Honored Contributor

Fixing a changed (bad) volume group

We had a disk go bad in a server. The disk was replaced, but now the kernel panics at boot due to an invalid (or changed) volume group.

The bad drive had just been added to the volume group (that's how we found out it was bad), so there was no data on it, but it went bad right in the middle of an extendfs. The boot lvol is in the bad vol group, but is not the lvol that was being extended.

I'm assuming the kernel is panicing because the new drive is a different size than the old drive and therefore the vg is invalid.

What I want to know is if there is a way to fix the volume group definition file from an alternate boot device? Right now, I can boot from the install CD and get a restore shell, but I'm not sure what to do from there.

Pertinant info:

HP9000/G50 running HPUX 10.20

Primary boot 52.4.0.0

vg00
Disk 1 52.4.0.0
Disk 2 52.5.0.0

Disk2 is the one that went bad. It was a 2GB SE. We replaced it with a 4.2 GB SE.

Thanks for any help.

PS We do have a make_recovery tape and then nightly backups, but this will take a long time to restore. I was hoping there was a way to fix this without re-installing.
10 REPLIES 10
James R. Ferguson
Acclaimed Contributor

Re: Fixing a changed (bad) volume group

Hi Brian:

If you replaced the drive according to the requirements and procedures specified in the Software Recovery Handbook (chapter-16, LVM) then the disk size shouln't matter (although you may not be able to utilize the full extent of the larger disk due to a 'max_pe' value that was established for the 2GB disk:

http://www1.itrc.hp.com/service/iv/docDisplay.do?docId=/DE_SW_UX_swrec_EN_01_E/LVM.pdf

Regards!

...JRF...
James R. Ferguson
Acclaimed Contributor

Re: Fixing a changed (bad) volume group

Hi Brian:

If you replaced the drive according to the requirements and procedures specified in the Software Recovery Handbook (chapter-16, LVM) then the disk size shouln't matter (although you may not be able to utilize the full extent of the larger disk due to a 'max_pe' value that was established for the 2GB disk:

http://www1.itrc.hp.com/service/iv/docDisplay.do?docId=/DE_SW_UX_swrec_EN_01_E/LVM.pdf

Regards!

...JRF...
Sridhar Bhaskarla
Honored Contributor

Re: Fixing a changed (bad) volume group

Hi Brian,

Was this a mirror or a standalone disk?.

If it is a mirror, you can get your system recovered. If it not and if it has other logical volumes in it, then you are better off with reinstalling the OS. That's why it is always handy if you make regular make_tape_recovery tapes.

If it is a mirror,then you will need to use quorum override option to boot the server before you can restore the mirror. Reboot the server. Interact with bootadmin prompt. Go into ISL prompt.

ISL> hpux -lq

The system will boot. It will also complain about disk 2 being not matched with the VG information. Once it comes up, you will need to do a 'vgcfgrestore' on disk2. Search the forums on restoring mirror disk. There are many posts detailing the procedure on it.

You will need to note that, though you can add 4GB disk, only 2GB will be used from it as the maxPE value of the volume group would have already set based on the first PV added.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Brian Bergstrand
Honored Contributor

Re: Fixing a changed (bad) volume group

James:

I'm not sure what the "correct" way is. We had HP come out and replace the drive. It is installed on the same controller and id as the old drive (52.5.0). All the SE did is pull the bad drive and replace it with the new one.

Sri:

No, the vg is not mirrored. So it looks like I will have to do a full restore. I was afraid of that.

Thanks to you both for the quick responses.
James R. Ferguson
Acclaimed Contributor

Re: Fixing a changed (bad) volume group

Hi (again):

Well, if the boot disk isn't mirrored, then it looks like you need to re-Ignite with your Ignite recovery tape. Sorry. I would urge you to acquire two larger disk for vg00 and use one for a mirror.

Regards!

...JRF...
S.K. Chan
Honored Contributor

Re: Fixing a changed (bad) volume group

Well the correct procedure for a non-mirrored vg00 disk replacement would depends on what logical volumes are on that "bad" disk. If the bad disk contain non-root data (sometimes people do that) like /home or any mount points which are not part of root volume then all you would do is .. (boot the system up in LVM maintenance mode)
ISL> hpux -lm (;0)/stand/vmunix
# pvcreate -f /dev/rdsk/cXtXd0
# vgcfgrestore -n /dev/vg00 /dev/rdsk/cXtXd0
# vgchange -a y /dev/vg00
# newfs -F vxfs /dev/vg00/rlvolX
==> Do this for all affected lvols.
Restore the data from backup.
# shutdown -r 0
If you got root FS (ex: /opt,/usr) on the bad disk then you have to reinstall.
Brian Bergstrand
Honored Contributor

Re: Fixing a changed (bad) volume group

SK.

The lvol that I was extending is /usr. But, I was trying to extendfs /usr when the disk went bad. So there was no actual data on the bad disk. All of /usr is currently on the primary disk (52.4.0). There may have been some vg/fs info the bad disk, but the superblock and all of the data for the fs is still on the primary disk.

Would it be possible to take the steps you stated, but instead of newfs us fsck to repair the volume?

Thanks.
S.K. Chan
Honored Contributor

Re: Fixing a changed (bad) volume group

Given your scenario you may want to try this instead .. btw you said extendfs failed on you, what happened next .. did you try to undo your LVM config ? Or the whole thing just hung and you have to reboot .. ?
ISL> hpux -lm (;0)/stand/vmunix
==> Boot in LVM maintenance mode so that vg00 will not be activated. You may need "-lq" option also if this does not work.
# vgchange -a y -q n vg00
==> Try to activate vg00.
# vgreduce -f vg00
==> Remove missing PV (hopefully this should remove the bad disk from vg00 config)
# mv /etc/lvmtab /etc/lvmtab.org
# vgscan -v
==> Rebuilding the lvmtab file.
If the above does not work you may need to get a copy of your original /etc/lvmconf/vg00.conf file from the backup and use this to restore the LVM header back to the good disk. The vg00.conf file MUST be up-to-date till the point BEFORE the /usr is extended. And you would (before you activate vg00) ..
# vgcfgrestore -f /dev/rdsk/
Next you should proceed with activating the VG..
# vgchange -a y vg00
.. and rebuild lvmtab file.
Update the post on your status ..
S.K. Chan
Honored Contributor
Solution

Re: Fixing a changed (bad) volume group

The more I think about it the more I feel that I must have missed something .. lets try this again ..I think this should be closer to the solution ..
ISL> hpux -lm (;0)/stand/vmunix
==> Boot in LVM maintenance mode so that vg00 will not be activated. You do not need "-lq" because LVM maintenance mode takes care of that.
# /sbin/vgchange -a y -q n vg00
==> Activate vg00 (escape quorum) and this should work .. it will complain that the bad disk is missing. No problem..
# /sbin/lvdisplay -v /dev/vg00/lvolX
==> The bad disk will show "???" in the extent distribution table. From here you should be able to determine the original size of this LV (ie exclude the bad extents).
# /sbin/lvreduce -L /dev/vg00/lvolX
==> This is the crucial part .. you have to lvreduce the LV to its original size, otherwise the vgreduce operation (later) will complain.
# /sbin/vgreduce -f vg00
==> Now this should work.
# /sbin/vgcfgbackup vg00
==> Back it up (the lvm config).
# mv /etc/lvmtab /etc/lvmtab.org
# vgscan -v
==> Rebuilding the lvmtab file.
Finally just reboot your machine and it should come up to it's previous state. The bad disk can now be dealt with later.