Failed root disk...

 
SOLVED
Go to solution
Jonathan Caplette_1
Super Advisor

Failed root disk...

Hi guys!!

I've got an EMS telling me that I've a failed internal disk, this is a root disk.

Here is my situation:
The system is a rp7410 with 2 73 Gb mirrored;
The system is up and running;
The disk that giving me errors is the primary boot disk;
When I do a pvdisplay -v I see some PE that are stale, other current;
When I do a vgdisplay -v vg00 I see that the disk is unavailable;
With vgdisplay I see there's 2 PV for each LV;

I want to know how I can replace the failed disk without rebooting the system???
11 REPLIES 11
KapilRaj
Honored Contributor

Re: Failed root disk...

if u have a hot-swappable disk u can do it online ...

U will need a new disk and then restore vginfo (vgcfgrestore) onto it and a vgsync u r done.

Regds,

Kaps
Nothing is impossible
KapilRaj
Honored Contributor

Re: Failed root disk...

don't go ahead as i hv never done it before ... U will need to preapre the new disk as bootable like creating LIF areas and stuff like that.

Kaps
Nothing is impossible
Steven E. Protter
Exalted Contributor

Re: Failed root disk...

Use cstm, mstm or xstm to determine for certain which disk has actually failed.

dd if=/dev/dsk/c0t1d0 of=/dev/nulll bs=200k

See which disk lights up. The one that doesn't is probably dead.

Be sure because if you pull out the wrong disk, you crash.

These servers have hot swapable disks. They can be pulled while the system is running.

If you can, remove the stale logical volumes on the dead disk. then vgreduce the volume group to remove it from the volume group.

Switch the replacement disk with the bad disk.

Then re-mirror.

It should be possible to not remove the logical volumes.

To do that after the switch, you can do as follows:

cd /etc

mv lvmtab lvmtab.old

vgscan -a

pvcreate the new disk

vgextend the volume group.

re-mirror the logical volumes.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Jeff_Traigle
Honored Contributor

Re: Failed root disk...

According to the specs in the following link, the drives are hot-pluggable bays, not hot-swappable... therefore, you're not replacing the bad disk without bringing the system down.

http://www.hp.com/products1/servers/rackoptimized/rp7410/specifications.html
--
Jeff Traigle
Geoff Wild
Honored Contributor

Re: Failed root disk...

Yes - they are hot swap - have done it a couple of times on 2 of my 7410's.

1) If the disk is completely "dead", such as if you run ioscan and status is "no_hw" then you can hot swap the disk online.

2) However in circumstances where the disk has not fully failed please do one of the following to avoid data corruption :

a) reduce mirror before replacing the disk
b) deactivate VG before replacing the disk
c) shutdown system to replace the disk

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Sundar_7
Honored Contributor

Re: Failed root disk...

Hi,

This is how I would do it

1) If the disk is hot swappable, remove the disk

2) # vgreduce -f /dev/vg00

This will remove the failed disk from vg00

3) # lvlnboot -R

4) Insert the new disk

5) # ioscan -fnC disk

6) # vgcfgrestore -n /dev/vg00 /dev/rdsk/

7) # mkboot /dev/rdsk/devicefile

8) # mkboot -a "hpux" /dev/rdsk/devicefile

9) # vgchange -a y /dev/vg00

10) # vgsync /dev/vg00

11) # lvlnboot -R

12) confirm with vgdisplay, lvdisplay, pvdisplay and lvlnboot.

Learn What to do ,How to do and more importantly When to do ?
Sundar_7
Honored Contributor

Re: Failed root disk...

VG00 cannot deactivated when the system is running except in maintenance mode -

if the disks are not hot swappable

1) Shutdown and get the disk replaced

2) boot in single-user mode

3) ioscan, find the device file

4) vgcfgrestore -n /dev/vg00 /dev/dsk/

5) mkboot /dev/rdsk/

mkboot -a "hpux" /dev/rdsk/

vgchange -a y /dev/vg00

vgsync /dev/vg00

6) Reboot in multi-user mode

Learn What to do ,How to do and more importantly When to do ?
generic_1
Respected Contributor
Solution

Re: Failed root disk...

Yes, you should be able to replace the disk without a reboot.
Some would advise you shut down the system if the disk is not completely dead, because there is the chance you could corroupt your data.
Make sure you have a good make_tape_recovery or make_net on hand just in case things go aloft.

It would not hurt to make sure you do not have stale extents on both volumes with lvdisplay -v

Once you have determined a the correct disk, replace it with the new one.

Then do vgcfgrestore -n /dev/vg00 /dev/rdsk/yourdisk (ie. c6t6d0)
vgchange -a y /dev/vg00
mkboot /dev/rdsk/yourdisk
mkboot -a "hpux -lq (;0)/stand/vmunix" /dev/rdsk/yourdisk
lvlnboot -R
vgsync /dev/vg00

do a lvdisplay -v that you have no stale extents

Cross your fingers :)
generic_1
Respected Contributor

Re: Failed root disk...

Also, after things are back up, it is not bad to do a lvlnboot -v and make sure mirror looks bootable, and to take an updated backup. Keep the old backup around for a little while though just in case you would need to roll things back. When I have plenty of downtime I like to get my firmware updated, and since you are replacing a drive, it would be nice to keep your disks at the same firmware levels if they are the same, but this is a nice to have.