Operating System - OpenVMS
1754323 Members
2651 Online
108813 Solutions
New Discussion юеВ

remove a quorum disk when adding a 3rd node to cluster

 
SOLVED
Go to solution
Jan van den Ende
Honored Contributor

Re: remove a quorum disk when adding a 3rd node to cluster

Cass,


Do you really want to remove the quorum disk? With three nodes you have to have two nodes up to maintain a quorum.


That is NOT true.
Well... it IS true if you have two nodes crashing on you at the same time.....

But if you do a normal shutdown of first one node (and DO NOT forget the REMOVE_NODE option), then as soon as that node is down you can do the same to another node, and continue happily with a one-node cluster.

This same scheme only gets a little trickier if you have four (or more) nodes which you bring down to one. THEN, to get back, you will have to boot your first returning node conversational and set a lower EXPECTED_VOTES (see above) to prevent hanging until the next one boots and brings quorum.
Actually, to be exact, read this as VOTES, not nodes, in case not all nodes have equal votes...

hth

Cheers.

Have one on me.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Paul Coviello
Frequent Advisor

Re: remove a quorum disk when adding a 3rd node to cluster

Jan, so in your scenario of 4 nodes is that with or without the qdisk?

thanks
Paul
who real soon is going to have many drinks!
Jan van den Ende
Honored Contributor

Re: remove a quorum disk when adding a 3rd node to cluster

Paul,

that is definitely WITHOUT.

In my view (not necessary everybody's) a Quorum disk is just a tiny, stupid trick, regrettably necessary if you are unfortunate enough to be really needing a cluster, but painfully restricted to two, both potentially lonely, nodes...

I am gonna join a 100 KM traffic jam now, and when I have conquered that I will join you in spirit. I have some good beers cold.

Cheers.

Have one on me.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Uwe Zessin
Honored Contributor

Re: remove a quorum disk when adding a 3rd node to cluster

The rules about votes, expected_votes and quorum are the same whether you have a quorum disk or not. Just consider the quorum disk as a node that is almost always up - except that it enlarges you cluster transition times :-(

Jan, I hope you're doing well... I really HATE traffic jams!
.
Keith Parris
Trusted Contributor

Re: remove a quorum disk when adding a 3rd node to cluster

Cass's option of using a quorum disk can make the system management easier, as by giving the quorum disk one less vote than the total of the votes of all the systems, you can shut systems down one by one to a single node without having to use the REMOVE_NODE option, and yet you can survive failure of the quorum disk and continue to run, provided all the VMS nodes are up at the time.

The downside of this is that after a node leaves unexpectedly (e.g. power supply failure, cluster interconnect hardware failure, or Control-P/Halt), after the RECNXINTERVAL period has elapsed, it will take additional time (up to 4 times QDSKINTERVAL seconds) to re-validate the quorum disk's votes, and that can cause a delay in the cluster regaining quorum. With the old default value of 10 seconds for QDSKINTERVAL, that could take up to 40 seconds, which was a long time. With the new default value of 3 seconds for QDSKINTERVAL, that could now take up to 12 seconds.

And keep in mind that anytime you use the REMOVE_NODE option on SHUTDOWN to take the cluster below a majority of the potential votes, you have voluntarily created a situation where the quorum scheme cannot totally protect you against a partitioned cluster. For example, if you take the cluster down to a single node, and it continues to run, but its LAN adapter or whatever you're using for a cluster interconnect fails, then there's nothing to prevent the other 2 nodes from booting, forming a separate cluster (as 2 votes are enough to achieve quorum in this case), and trashing the shared SAN disks. (That is, there's nothing to prevent this happening other than your human intervention as a system manager to prevent someone from trying to boot the other 2 nodes at once).
Paul Coviello
Frequent Advisor

Re: remove a quorum disk when adding a 3rd node to cluster

ok did I ever think I would get into this deep of a discussion about this, not!

ok let me step back one step and describe the environment...

we have 3 nic cards in each machine one we use as a private lan to a vendor, second one is the main cluster interconnect, the 3rd is for the general lan and a failover for the cluster communications... we actually have a com file that changes the pedriver paths so that we come up as this. we do have the shared SAN disks and local disks for page and swap.

So if that changes anyones thinking let me know...

thanks
Paul
Keith Parris
Trusted Contributor

Re: remove a quorum disk when adding a 3rd node to cluster

Having 2 redundant LAN connections as your cluster interconnect reduces the risk of a single node being isolated from the other 2, and thus reduces your risk of a partitioned cluster when you don't have a quorum disk and you use REMOVE_NODE to reduce the cluster down to a single node.

Enabling cluster communications on your 1st LAN (but lowering its priority under SCACP so that while PEDRIVER will track its status and availability using periodic Hello packets, it doesn't really actually get used unless and until both of the other 2 links fail, could give you 3X redundancy instead of 2X, reducing the risk even further. But you might consider that to be overkill, especially if you're tracking failures on the existing two paths using LAVC$FAILURE_ANALYSIS.
Jan van den Ende
Honored Contributor

Re: remove a quorum disk when adding a 3rd node to cluster

Keith,

I already knew that where I am pessimistic about Murphy's Law (yes, he was an optimist!), I have to admit that you are paranoid about it. (you must have discussed this a lot with Tom Speake, I guess. He was equally paranoid on the issue).

And yes, if your disk communication path can not also function as an SCS path, like a shared SCSI bus, you ARE fundamentally right.
SAN interconnect is a class in its own. It was introduced to VMS NOT supporting SCS, but if my memory serves me well, it does cluster traffic since V7.3-2 (or was it some ECO, or was it intended to, but not yet there. I can not check that right now).

If, however, the path to the disks CAN also function as interconnect (like the good old CI, DSSI, SAN-if-supporting SCS...), then I can not see a way to a partitioned cluster. Teach me if I am not yet pessimistic enough.

Uwe,
not too bad today, made it within 2 hours, thats over an hour better than yesterday.
The way to fight it really is "lean back and enjoy the music"
Getting stressy will not win you 5 seconds, and is very bad for your health!

Cheers.

Join me in a good beer.

Jan

Don't rust yours pelled jacker to fine doll missed aches.
Keith Parris
Trusted Contributor

Re: remove a quorum disk when adding a 3rd node to cluster

Yes, if the storage interconnect also passes SCS traffic, that would help.

Support for Fibre Channel as a LAN (including cluster interconnect support) is presently slated for VMS version 8.3.
Wim Van den Wyngaert
Honored Contributor

Re: remove a quorum disk when adding a 3rd node to cluster

Alternative

Give each node 1 point and keep the quorum disk with 2 votes. Expected botes on 5.

Thus 1 node can start the cluster on his own (if he sees the q disk), no split clusters are possible and the complete cluster can survive a quorum disk loss.

It all depends on what you want.

Wim
Wim