Operating System - HP-UX
1845866 Members
4425 Online
110250 Solutions
New Discussion

replacing failed system disk

 
SOLVED
Go to solution
Aaron_4
Advisor

replacing failed system disk

I have a failed system disk. Ioscan says no hardware and diskinfo doesn't see it either. Anyway, is there a way of replacing this disk without bringing the system down? I come from the old Compaq world and we used to replace disks all the time without bringing the system down. Oh, I have mirrordisk and the system is an l2000 running 11.0. I think I know the answer but maybe you guys know something tricky.
Thanks
Aaron
awmorris
21 REPLIES 21
John Poff
Honored Contributor

Re: replacing failed system disk

Hi Aaron,

It really depends on the hardware. Some of the disks are hot-swappable, and if you have it mirrored you can do it. I've done it before when I've had root disks mirrored to Jamaica drives, and one of the Jamaica disks went belly up. If one of the internal disks has died and it isn't hot swappable, you'll have to shutdown.

JP
Sridhar Bhaskarla
Honored Contributor

Re: replacing failed system disk

Hi Aaron,

It depends on where the disk is located. If it is internal, there is no way you can do it online as you have to bring down the system to physically replace the disk.

If it is on hot-swappable enclosure like Jamaica, yes you can. Replace the disk and then run vgcfgrestore -n vg00 /dev/dsk/c?t?d? followed by vgchange -a y vg00.

For detailed step by step procedure, search for "mirror disk" or "disk replacement" in this forums.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
BFA6
Respected Contributor

Re: replacing failed system disk

Hi,

If the disks are mirrored and hot-swappable, then you should be able to pull out the failed didsk and replace, then run a vgcfgrestore & a vgsync.

Regards,

Hilary
MANOJ SRIVASTAVA
Honored Contributor

Re: replacing failed system disk

Since you ahve a mirror disk , ther should be no problem the very fact that the mirror disk is configured is to help in redundancy , in case the disk is hot swapaable the step you got to follow are as :

1. break the exisiting mirror
lvextend -m 0 /dev/vgxx /dev/dsk/cxtydz


2. remove the disk

3 replace the new one

4. extedn the miroor again

5. do a vgsync



Man on these would be really helpful


Manoj Srivastava
Aaron_4
Advisor

Re: replacing failed system disk

All,
They are internal Seagate drives. No enclosures. How would I know they were hot-swappable?

Aaron
awmorris
S.K. Chan
Honored Contributor

Re: replacing failed system disk

You did not quite say if this is a mirrored root disk or not. Regardless of whether this is hot-swappable or not these are the steps you must run after the disk is replaced.
(root disk)
# pvcreate -B /dev/rdsk/c2t2d0
# mkboot -l /dev/rdsk/c2t2d0
# mkboot -a "hpux -lq" /dev/rdsk/c2t2d0
# vgcfgrestore -n vg00 /dev/rdsk/c2t2d0
# vgchange -a y vg00
# vgsync vg00
(non-root disk)
# pvcreate /dev/rdsk/c2t13d0
# vgcfgrestore -n vg02 /dev/rdsk/c2t13d0
# vgchange -a y vg02
# vgsync vg02
Just take note if you have to shutdown the system to replace the disk, for the root-disk replacement the system must be brought up in LVM maintenance mode (hpux -lm) and for the non-root-disk replacement you can bring it up in single user mode (hpux -is).

John Poff
Honored Contributor
Solution

Re: replacing failed system disk

Hi Aaron,

If they are internal disks on an L2000, they aren't hot swappable. Sorry.

JP
John Palmer
Honored Contributor

Re: replacing failed system disk

The internal drives in the L2000 are hot swappable.

Regards,
John
Aaron_4
Advisor

Re: replacing failed system disk

Wait a minute! Who's right???
awmorris
John Poff
Honored Contributor

Re: replacing failed system disk

Aaron,

Oops! John Palmer is right. Sorry. We've had to replace an internal disk on one of our L2000s before, and I thought HP said it wasn't hot swappable. I checked with one of my teammates, and he said that HP told him it was hot swappable, but they greatly preferred to do it with the box down. So, if you really need to keep the system up, you should be able to do it.

JP
MANOJ SRIVASTAVA
Honored Contributor

Re: replacing failed system disk

Aaron


the disk are hot swappable , here is the link for the L class server which will help you :


http://docs.hp.com/hpux/hw/index.html#rp5400%20Series%20Server%20(L-Class)


Manoj Srivastava
BFA6
Respected Contributor

Re: replacing failed system disk

Hi,

We often get they are hot pluggable not hot swappable, and in a lot of cases if it's a root disk HP prefer the server to be down.

In general we replace them with the system up.

Hilary
John Poff
Honored Contributor

Re: replacing failed system disk

Hi Aaron,

Here is a quote from the L class manual (from the link that Manoj provided):

"CAUTION Disk Drives can be removed or installed with the server still powered on. This is referred to as a "manual HotPlug".

However, DO NOT replace a HotPlug disk drive until a controlled shutdown of the operating system has been performed."

So, what good it is a hot pluggable drive if you can't swap out a bad one on the fly? :)

JP
Sridhar Bhaskarla
Honored Contributor

Re: replacing failed system disk

There is a clear difference between hot-pluggability and hot-swappability though we use them interchangeably. hot-swap is superset of hot-plug. hot-plug guarantees "only" the hardware that is being plugged.

On L-class, internal disks are hotpluggable provided that we take care of the OS related configuration that would not result in system crash. For ex., if the disk is good and if you are using swap from it, though the disk is hotpluggable, you wouldn't do it because OS does not support it.

In this case, you can safely replace the disk as it is protected by Mirroring.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
BFA6
Respected Contributor

Re: replacing failed system disk

Hi,

I have had an engineer warn that pulling a disk with the server up could hang the bus. As we were willing to take the risk, they pulled the disk.
I have only seen a bus hang happen once.

Regards,

Hilary
Aaron_4
Advisor

Re: replacing failed system disk

Well I'm not feeling too comfortable with this. Since this is a production box I would rather ask for downtime then hoping that it won't go down with the swap. Thanks for all the input.

Aaron
awmorris
Xavier Gutierrez_1
Frequent Advisor

Re: replacing failed system disk

Hi, Aaron.

First of all, the internal disks in an L-xy00 are hot-swapable. They are intended to be replaced like that and I have never seen any problems replacing one of those disks hot.

The only problem you could find is when the actual problem is not the disk but the disk backplane (but it does not seem to be the case).

As the disk appears as NO_HW at ioscan, the disk's firmware is not replying to the controller's queries, so you can hot swap it without risk.

Once you have swapped the disks, you must do this steps:

1.- ioscan -fnC disk (just to see it appears as a claimed device)

2.- mkboot /dev/rdsk/cxtydz (restores boot area to the device)

3.- mkboot -a "hpux -lq" /dev/rdsk/cxtydz (restores AUTO file to the headers so that it can boot without the mirror pair available)

4.- vgcfgrestore -n vg00 /dev/rdsk/cxtydz (restores LVM config to the device)

5.- vgchange -a y vg00 (to activate the VG)

6.- lvlnboot -R /dev/vg00 (restores boot, root, swap & dump lvol information)

7.- lvlnboot -v /dev/vg00 (just to check it did last step right)

8.- vgsync vg00 (to sychronize the mirrored LVs)

10.- vgdisplay -v vg00 (just to check it finished OK)


Best regards.

Xavier.

7.-
Ian Hillier
Frequent Advisor

Re: replacing failed system disk

Welcome to the club! I have a failing internal hard drive on my L2000 also (vg00) and I'll be changing it out this weekend. My understanding is you can swap the drive while the system is online and re-mirror everything EXCEPT the root partition. For this the system must be in LVM maintenance mode (down). So, if you're comfortable without / being mirrored, you should be able to do this all online.
Aaron_4
Advisor

Re: replacing failed system disk

I had it replaced last night. Spoke to the engineer about it and he said sometimes it works and sometimes not. He didn't know the reason why but he had first hand experience with both. He said some of his customers will have it hot swapped and if ioscan didn't find it they would schedule downtime for reboot and it would show up.

Thanks
aaron
awmorris
Sridhar Bhaskarla
Honored Contributor

Re: replacing failed system disk

Aaron,

I think I was the one who confused you and my apologies. I didn't read your message well (hence did not note L-class) and I wrote it in a generic cautious sense first time keeping in view K-class servers.


The internal disks on L-class servers are hot-pluggable. You can safely replace them if the OS will allow them. In this case, replacing a mirror disk is very safe and you could do it without having to schedule downtime. In fact, I use this method to build the systems on L-class, N-class servers and RPs. Replace the disk with a new one online, restore the mirrors and use the disk to boot another server, boot and change the IPs and hostnames. I never encountered issues with it.


-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Brian M Rawlings
Honored Contributor

Re: replacing failed system disk

Sorry to beat a dead horse (or live disk, in this case), but somebody might look through this string for advice some day, and a point of clarification is needed.

Hot-swappable and Hot-pluggable both imply that the housing for the drive is designed to be replaced with power on. Various engineering groups leave varying levels of warnings in the manuals about halting the OS, this is like the note in my pager manual warning me not to eat the battery cover because it is a choking hazzard. It is a silly CYA statement.

As to the difference between "Hot-Pluggable" and "Hot-Swappable", it boils down to:

"Can I yank a drive, put in another one, and walk away, no further actions, and the drive returns to normal operation by itself?" (HOT-SWAPPABLE, found only in disk arrays with controllers who take care of the rebuild automatically), or...

"If I yank a failed drive and install a new one, do I have to run a script or take actions on my own to get the drive back into operation?" (HOT-PLUGGABLE, essentially any JBOD hot-plug unit like a HASS/Jamaica, or internal drives in A- D- L- N-class units, the RP-series, and all modern servers).

As to whether or not swapping out a drive in an L2000 will cause problems, it is a low probability, but still a small chance. We do it, all the time, in our lab, with nary a problem. Likewise, the older HASS/Jamaica chassis, we've yanked drives and moved them around, live, hundreds of times, whthout incident.

If hot replacement of a drive could hang a SCSI bus, would EMC or HP or HDS allow it in one of their big arrays? All the SCSI buses in those have multiple drives on them, and they are as likely to hang as any other SCSI bus, yet it is (of course) commonplace to hot replace a drive in an array (all arrays allow this). SCSI is SCSI, and it works for them, I have no problem doing it in my L2000s, or N4000, or whatever.

Incidentally, all of these L-, N-, RP-series units have seperate SCSI buses for each drive, or each pair of drives. You mirror across SCSI buses for a reason, so, I say, you're covered to do it hot with the OS up.

Lastly, it IS possible to mirror the root partition with the OS up. Every part, including the ODE/diags, can be mirrored (some things just get installed on both drives, not actively mirrored, of course). There are scores of solutions in this archive which detail the steps to mirror your root drives, and the only concern is doing the root, swap, and others in the right order.

It can all be done hot, folks. It is useful to prove this to yourself in a lab, or on a test box, but if you don't have that luxury, hopefully this endorsement helps.

Regards,
--bmr
We must indeed all hang together, or, most assuredly, we shall all hang separately. (Benjamin Franklin)