Operating System - HP-UX
1832609 Members
2556 Online
110043 Solutions
New Discussion

It seems to be working - Changing the clusterlock PV path without halting the cluster

 
Karthik S S
Honored Contributor

It seems to be working - Changing the clusterlock PV path without halting the cluster

Hi All,

You would have noticed threads with similar titles hitting the forums very frequently. In fact even I have started few threads like the ones below,

a. Change in hardware path of the clusterlock disk
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=407506

b. Edit Binary Files
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=488489

------

In a huge SAN setup the device files name can change even when all the cluster nodes are online (target disk is same but just that the device instance changes). However, if the device paths of the cluster lock disk changes then there is no way to update the configuration without halting the cluster.

I can think of couple of solutions for resolving this problem (some of them are already discussed in few of the earlier threads).

1. Create a soft link to the new device files with the old PV name for the cluster lock PV (/dev/rdsk and /dev/dsk).

2. Change the instance number of ext_bus for the shared disks to its old value and run ioinit to recreate ioconfig.

3. Using mksf create device special files referencing the old names to the hardware path of cluster lock PV.

4. Remove the node (on which the device instance numbers of changed) from the cluster and readd without halting the cluster.

All the above 4 methods are just working fine in my test setup. Though I am not sure of the problems these workarounds may introduce in a production environment.

Your valuable inputs/suggestions/workarounds on this are welcome.

Thanks a lot,
Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn
10 REPLIES 10
G. Vrijhoeven
Honored Contributor

Re: It seems to be working - Changing the clusterlock PV path without halting the cluster

Hi Karthik,


1. .....OK
2. for ioinit you need to reboot so i do not consider this an online option.
3. .....OK
4. Ok but you need to switch the package currently running on the hosts, so it is a partial solution and you need a lot of spare computing power on the alternate node. If you have a 3 node cluster or bigger you do not need a lock disk so you need 50% extra capacity on the spare node. ( Need it anyway in case of crash but ok)

The altering of the LVM tab online should create a difference between the LVMtab and the active kernel. This can also cause strange mesages when you need to alter vgconfigs.

HTH,

Gideon
Stephen Doud
Honored Contributor

Re: It seems to be working - Changing the clusterlock PV path without halting the cluster

For those lines that can be performed without interrupting cluster and package services, they are very useful workarounds.

I'd think most admins would want all servers to look alike, so they'd probably use a workaround, but want to correct the problem when a maintenance window occurs.

-Stephen Doud
Jeff Schussele
Honored Contributor

Re: It seems to be working - Changing the clusterlock PV path without halting the cluster

Uhmmm...Would it not be useful to enforce a *No Change* policy with your storage team?!?!
What would happen if they shrink a LUN while you're online? I can tell you that it wouldn't be pretty when LVM reached that last block.

This really need to be a mgmnt level directive or they run the risk of bringing production down - period. Then probably change control would be a higher priority, I'd bet ;~)

My 2 cents,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Sundar_7
Honored Contributor

Re: It seems to be working - Changing the clusterlock PV path without halting the cluster


Since you are on HP network, you can download cminitlock from the following link

http://wtec.cup.hp.com/~hpuxha/tools/index.html

I beleive with cminitlock you can edit the lock file configuration.

Just curious enough, are u saying you managed to change the instance number without rebooting the system using ioinit ? - I am under the impression to change the instance number using ioinit, you need to reboot the machine ?!?!?

Learn What to do ,How to do and more importantly When to do ?
Karthik S S
Honored Contributor

Re: It seems to be working - Changing the clusterlock PV path without halting the cluster

Hi Sundar,

cminitlock is used to initialize a disk as lock disk when the original disk fails. And that is it ...

You are right .. ioinit requires a reboot. What I really meant was that you need not halt the entire cluster. The cluster may be running on the remaining node when you do a ioinit on the second node.

-Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn
Karthik S S
Honored Contributor

Re: It seems to be working - Changing the clusterlock PV path without halting the cluster

Stephen Doud:
I agree with you. This is not meant to be a permanant fix and should be corrected during the maintenance window.

Jeff Schussele:
I am assuming the worst case here. If something of that sort happens I want to have something ready in my hand.

Thanks,
Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn
Karthik S S
Honored Contributor

Re: It seems to be working - Changing the clusterlock PV path without halting the cluster

Melvyn ... are you there? :-)

-Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn
Jeff Schussele
Honored Contributor

Re: It seems to be working - Changing the clusterlock PV path without halting the cluster

Hi Karthik S S,

Well if something of the sort I mentioned - shrinking an OnLine LUN size that then fills up - you better have a backup tape in hand because it will be "game-over". There's nothing you OR the storage team can do - the FS will be corrupted & will need to be recreated and restored from tape - period. That is of course AFTER the storage goofballs put the LUN size back to where it was to begin with.

The point is they should NOT be doing ANY thing that will change the zoning or LUNs in any fashion while a production system is online w/o FIRST notifying you and the application owners & *everybody* having signed off on it. Change control is a necessary - and I'd dare say mandatory - tool in a production environment today.

If they object, put it to them this way - Would they like it if you accessed one of the switches & rezoned the switch -> array connection? Do they think it would definitley cause trouble & probably harm? Well then why do they *not* think the opposite is true?

IMHO I really think your management had better get it's head out of the sand or they WILL experience production downtime if practices like this are allowed to continue.

Dismounting soap box,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Patrick Wallek
Honored Contributor

Re: It seems to be working - Changing the clusterlock PV path without halting the cluster

Karthik,

I've got to agree with Jeff on this. Even the remote possiblity that your storage team might make a change that could effect your unix device files is unacceptable.

If they were to make a change that happened to effect your cluster lock disk, I think that would be the least of your worries. I would think you would be more concerned with data becoming unavailable because the disks are no longer where the VGs think they are.

Change control is an absolute must! Anything your storage folks do that MIGHT effect you, you MUST know about so you can prepare for the worst.
Colin Topliss
Esteemed Contributor

Re: It seems to be working - Changing the clusterlock PV path without halting the cluster

Don't trust it!

I migrated data from one XP array to another, not remembering that one of the devices I moved was acting as a cluster lock volume. I dutifully tested that failover still worked (blissfully unaware of the problem). It wasn't until several weeks later that, when restarting the cluster from scratch, it all fell in a nasty heap because it could no longer access the cluster lock disk. The cluster ran with no problems for ages, despite the cluster lock disk being removed.

I can only surmise that the cluster lock volume is accessed only very rarely (certainly on starting the cluster). Although its great to try and do SG modifications on-line, I've come to learn (the hard way) that there are no guarantees when it comes to modifying a cluster configuration. It all looks great until the day you do a clean restart and you're left with a configuration that does not work and (potentially) a multitude of modifications to trawl through.

I personally think that a planned outage to fully test the cluster (or at the very least plan cluster shutdown/restarts between changes) is the way to go.