Operating System - HP-UX
1834374 Members
2217 Online
110066 Solutions
New Discussion

Changing cluster lock without stopping cluster...

 
SOLVED
Go to solution

Changing cluster lock without stopping cluster...

There's a question thats been coming up a fair bit recently in the ServiceGuard forum - that of changing disk device IDs in a cluster. So far all the accepted wisdom on this is that:

1) If you have alternate paths you can change all the volume groups 'online' (using pvchange/vgreduce etc. etc.). Even if you don't you can switch all your packages to another node and do everything effectively offline...
2) What you *can't* do however is change the device file used for the cluster lock. This requires the entire cluster to be halted before a 'cmapplyconf' can be run to point to the new device file of the lock disk.

..And this is the cause of some frustration as in the 'land of the SAN' device files can change more often than you would wish.

So I have been doing some thinking and came up with an option that *may* get around this, but I am interested in others opinions (Specifically our SG gurus Melvyn Burnard & Stephen Doud)...

You can't change the cluster lock while the cluster is up, but you *can* add and remove nodes to the cluster... so what's to stop me doing the following:
a) cmhaltnode on node I want to change
b) edit cmclconfig.ascii & comment out all references to the node I want to change (including the cluster lock reference)
c) edit the pkg.conf file for all packages that can run on the node in question and comment out that node.
d) run the necessary cmcheckconf/cmapplyconfs
e) make all the VG / device file changes I want to make on the node in question.
f) Remove the /etc/cmccluster/cmclconfig binary on the node in question
g) reset all the text config files to include the node again, critcially with the new path to the cluster lock disk...
h) run the cmcheckconf/cmapplyconfs
i) cmrunnode

So why wouldn't this work? Have I missed something?

Cheers

Duncan

I am an HPE Employee
Accept or Kudo
14 REPLIES 14
Kent Ostby
Honored Contributor

Re: Changing cluster lock without stopping cluster...

The problem I would guess would be with "g" or possibly "e".

According to the document, UXSGKBAN00000035, which documents the changes you can make to a running cluster, you can't change the FIRST CLUSTER LOCK VG or FIRST CLUSTER LOCK PV.

I suspect your process would fail for one of those two reasons.

Best regards,

Kent Ostby
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"

Re: Changing cluster lock without stopping cluster...

Kent,

Yes, I'm aware what the manual states on changing these values, howwver:

1) I won't be changing the FIRST_CLUSTER_LOCK_VG

2) I won't actually be *changing* the FIRST_CLUSTER_LOCK_PV for the node in question... I'll actually be taking that node out of the cluster altogther, and then adding it back in with a new configuration. Adding ad removing nodes to a running cluster *is* supported.

...keep 'em coming!

Duncan

I am an HPE Employee
Accept or Kudo
Jeff Schussele
Honored Contributor

Re: Changing cluster lock without stopping cluster...

Hi Duncan,

Problem as I see it is that you won't be able to get the new binary into play on the other nodes w/o halting the cluster. So you'd end up with an "unbalanced" cluster i.e. one node running a diff binary than the others. I even doubt that the node that was changed would be able to join the cluster.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Re: Changing cluster lock without stopping cluster...

Jeff, if that were the case how would you ever be able to add a new node to a running cluster (which *is* supported)?

CHeers

Duncan

I am an HPE Employee
Accept or Kudo
Jeff Schussele
Honored Contributor

Re: Changing cluster lock without stopping cluster...

Hi Duncan,

Not 100% certain - maybe Melvyn or Stephen can enlighten us - but I suspect it has to do with the contents that are altered in the cmclconfig file by these different mods.
I suspect that node changes are strictly text changes whereas LOCK* changes are binary changes. The text changes can be re-read whereas the binary portions are loaded to memory & therefore changes cannot be detected. Same rules apply for heartbeats or monitored subnets. So you couldn't change these either w/o cluster down, but VGs can be changed on the fly - again possibly text changes for VGs but binary for heartbeats & subnets.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Re: Changing cluster lock without stopping cluster...

Jeff,

Have a read of Steheph Doud's post in this thread:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=211995

This is what got me thinking - hopefully Melvyn or Stephen will pop along here shortly and clear this up!

Cheers

Duncan

I am an HPE Employee
Accept or Kudo
John Poff
Honored Contributor

Re: Changing cluster lock without stopping cluster...

Hi Duncan,

I've done some wild and crazy stuff with MC/SG before, but I don't see how you can change cluster lock disks on the fly. Your problem will come in step 'h', when you run 'cmcheckconf' and SG complains about the cluster lock disk being different between the existing nodes and the node you are adding back.

The trick is that in a running cluster, there needs to be an agreement between all the nodes about which disk will be the cluster lock disk so that when they need to have that election, they will be certain that they have a clear winner.

Have you thought about moving away from the cluster lock disk and using the new Quorum Server software? That way the nodes all point to a server instead of a disk.

JP

Re: Changing cluster lock without stopping cluster...

Perhaps I need to be a bit more precise about what I'm talking about...

*I don't mean that I want to physically change which disk the cluster lock is on, rather that following on from SAN changes on a host I need to point ServiceGuard at another device file.

*I'm not proposing to just halt the node in the cluster do the change and then restart the node, rather to halt the node, and then completely de-cluster the node, then make the changes, re-cluster the node, and then start the node.

Here's an example:

-nodeA and nodeB are in a cluster and both see /dev/dsk/c3t0d0 as the cluster lock.

-SAN changes need to take place that effect nodeB (e.g. mvoing off a FC hub and onto a FC switch).

-Following my steps a-d in the post aboveabove I take nodeB out of the cluster *completely*

- I then make the necessary changes to nodeB (vgexports/vgimports etc.) - step e

- Then I bring the node back into the cluster following steps f-i - at no stage are the two nodes 'out of sync' on which device is the cluster lock, cos physically it never changes, and when the name changes logically nodeB isn't part of the cluster anyway.

I am an HPE Employee
Accept or Kudo
Stephen Doud
Honored Contributor

Re: Changing cluster lock without stopping cluster...

Hi Duncan,

A running cluster checks it's cluster lock disks every 60 minutes. The cluster binary tells SG where to look for the lock disk:

# cmviewconf | grep -e "Node" -e "lock" | grep name
first lock vg name: /dev/vg01
second lock vg name: (not configured)
Node name: eon
first lock pv name: /dev/dsk/c0t4d0
Node name: ion
first lock pv name: /dev/dsk/c0t4d0

If I were to remove a node from the cluster in order to change the lock PV reference (to some other /dev/dsk/c-t-d-), the PVID on the new path MUST match that of lock disk still listed in the binary file for the other nodes remaining in the cluster. Else it's not the SAME disk. SG's cmcheck/applyconf verifies at this level of granularity. It has to, in order to insure each node can talk to the very same disk as a lock disk.

-sd

John Poff
Honored Contributor

Re: Changing cluster lock without stopping cluster...

Duncan,

Thanks for the explanation. If I understand you now, the PV that your cluster uses for the cluster lock disk doesn't really change; you are just changing the addressing of how you refer to that PV. In that case, I would think that it should work as MC/SG should recognize that the PV used for the cluster lock is the same for both nodes, even if it is referred to differently.

You would normally just halt the cluster and make those kinds of changes, but what you describe sounds like it would work and allow you to keep one node up and running. Sounds like fun!

JP

Re: Changing cluster lock without stopping cluster...

Stephen,

Thanks for that - assuming what I am doing meets with your requirement - that the PVID remains unchanged, only the device file that references that PVID - would what I am proposing actually work?

Thanks

Duncan

I am an HPE Employee
Accept or Kudo

Re: Changing cluster lock without stopping cluster...

Ping!

Still looking for a definitive 'it works' or 'it doesn't work' statement...

Cheers

Duncan

I am an HPE Employee
Accept or Kudo
Stephen Doud
Honored Contributor
Solution

Re: Changing cluster lock without stopping cluster...

Sorry - I only get forum notification email once a day :)

Changing the cluster lock disk is not possible while the cluster is running.
However, removing a node and re-adding it with a different c-t-d- path to the SAME cluster lock disk is acceptable to cmapplyconf.

I use this principle (removing and re-adding a node) in a document or two, when other features of a node in the cluster need to be changed while the cluster is running.

-sd

Re: Changing cluster lock without stopping cluster...

Thanks Stephen,

Good to know that this is another option that will work...

Duncan

I am an HPE Employee
Accept or Kudo