Operating System - OpenVMS
1754885 Members
3833 Online
108827 Solutions
New Discussion юеВ

Re: Managing a cluster with 2 nodes without quorum disk

 
Yves Kinnaer
New Member

Managing a cluster with 2 nodes without quorum disk

Hi,

I have this issue with a cluster consisting of two nodes, no quorum disk (OpenVMS 7.1). Some of my former colleagues made some system adjustments and these are the actual values:
Node 1 (primary): 2 votes ; Node 2 ( secondary) : 1 vote. Sysparam expected_votes on both systems: 2. Cl_exp = 3 ; Cl_votes = 3. The application is running on node 1 (shadowcopying using a virtual DSA1...). What do I have to do to run the application on Node 2 ? If I stop the application on Node 1 & shutdown using remove_node , I'll expect the second one to hang (until I reboot Node 1) because expected_votes = 2. In some old documentation, I do find a description of the expected_votes with value 1 (!). (Although it is generaly recommended to set this parameter to the total of all possible votes in order to avoid cluster partitioning...). Can I change the expected_votes to 1 & do I use "set write sysparams 0" or not ?
14 REPLIES 14
Wim Van den Wyngaert
Honored Contributor

Re: Managing a cluster with 2 nodes without quorum disk

Using remove_node will NOT put the cluster in hang. Not using it will do that.

What you have to do with the application to run it on the other node depends on the application. Some might need reconfig or extra installations.

Just use a quorum disk if you can.

Wim
Wim
Jan van den Ende
Honored Contributor

Re: Managing a cluster with 2 nodes without quorum disk

Yves.

to begin with:

WELCOME to the VMS forum!

- with a total of 3 votes, it is HIGHLY ADVISABLE to set expected_votes to 4 (equal on all systems)
- the WHOLE PURPOSE of the Shutdown option REMOVE_NODE is, to lower ("recalculate") the value of quorum directly after tehe removal of a node (as part of the "state transition", so, using that option THE REMAINING CLUSTER WILL NOT HANG (even when down to one node).

Please, PLEASE do NOT fiddle with EXPECTED_VOTES, because THAT would be what allows a partiononed cluster (and this is one of the few points where I really prefer HPUX terminology: they call it a "split brain cluster).

As an aside: 7.1 has gone out of support quite some time ago. Any specific reason for not upgrading?

Second aside: "failover" is not the prefered way to run an app on VMS. Any reason for not running it clusterwide? (although good reasons DO exist: real-time app, memory-resident app, ported *UX dabase engine...). Just curious. :-)

Hth.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Wim Van den Wyngaert
Honored Contributor

Re: Managing a cluster with 2 nodes without quorum disk

I must disagree with Jan.

Expected votes must be the sum of all votes of all voting members. In your case that is 3.

To have the majority during booting, node 1 can boot alone because 2 votes is a majority when 3 votes are in the game.
Node 2 can not boot on his own because 1 is a minority when 3 votes are in the game.

When you set expected_votes to 4, even node 1 will not have the majority to boot.

Again, use a quorum disk with e.g. the page files on.

Wim
Wim
Jan van den Ende
Honored Contributor

Re: Managing a cluster with 2 nodes without quorum disk

Wim,

thanks for your correction. I wondered what you meant, until I noticed my TYPO. OF COURSE EXPECTED_VOTES should be the sum of all votes in the normal, full configuration! Should teach me once again be more severe in proof-reading before posting!

Yves, sorry for trying to mis-guide you!

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Yves Kinnaer
New Member

Re: Managing a cluster with 2 nodes without quorum disk

Hi,

Thnx for the replies...
- Beats me why the application is not running clusterwide. I'm not really a OpenVMS-specialist (just supporting the application running on it...). In case of OpenVMS or hardware related issues, we always get in contact with an external 2nd line support firm. And that's maybey still the right thing to do :-)
Yes,it is a real-tim app (a WCS (Warehouse Control System) is running on it) and e.g. all operational data is mirrored (application disks are kept in sync using shadowcopying). So we should be able to switch between the nodes at any time.
- There is no shared non-shadowed disk directly accessible by both nodes so we're not able to use a quorum disk.
- Still wondering why I did find the "expected_votes=1"-value in some old documentation. Is there really a risk of a "split brain cluster" when the application is only running on one node ?
- Still 7.1: nobody (management,...) is really bothering about it. The cluster & application has been running steady for more than 10 years. Even our support contractor is not making any fuzz about it... And everybody is migrating to Windows-based app's so it's budgetary rather difficult to start talking about an upgrade.
- Besides this 2 node cluster, there are also 2 3-node clusters to deal with. There is a proposal to build a 5-node cluster to get rid off the risks related to the 2-node cluster (4 nodes & 1 quorum node). What do you think about that ?
- We're planning to make some system & application disk backups during the weekend. And we also want to test the switch over-procdure (stopping the application on node 1 & start up on the backup node). So again: is it really ou of the question to set the expected_votes to 1 ?

Yves
Wim Van den Wyngaert
Honored Contributor

Re: Managing a cluster with 2 nodes without quorum disk

If you set expected_votes to 1, systems can always boot on their own, causing a split cluster. You never do that in a cluster.

A 3 or 5 node cluster may solve your quorum problem. But isn't it simplier to to add a disk or to reorg so that 2 disks are freed ?

Wim
Wim
Richard Brodie_1
Honored Contributor

Re: Managing a cluster with 2 nodes without quorum disk

'Still wondering why I did find the "expected_votes=1"-value in some old documentation. '

If your primary node failed hard, and your backup node went down then it would be reasonable to set expected votes to 1 on the backup node; you probably want to get your system running before the first node gets repaired. Once you have done that you then need to ensure not to boot the primary node outside the cluster. One of the risks is that, if you aren't extremely careful, you can propogate shadow copies the wrong way.

As for replacing the cluster with a 5 node one: it seems a bit like overkill, or at least you are going with a system with no redundancy to one with double redundancy. Three nodes with equal votes protects you from a single node failure. Sure, 3/5 is better but it if you are doing it for redundancy, rather than performance, the budget might be spent better elsewhere.

As for the expected_votes set to 2 or 3, both give a quorum of 2, so it makes no practical difference. Setting it to 3 would be good practice, though.

Jan van den Ende
Honored Contributor

Re: Managing a cluster with 2 nodes without quorum disk

Yves,

First: Indeed, expected_votes = 1 is A VERY BAD IDEA!

Second: consolidating into a single cluster in my view is the one step that
- improves your availability
- HUGHLY simplifies system management
- is very cost-effective.

But, if you are going to consolidate multiple systems/clusters into one cluster, be sure to do the thinking AHEAD of the implementing! Changing the chosen config afterwards is much harder than changing the planned setup. But only you ( = somebody that KNOWS the apps, and the various constraints) can do that (or, of course, one expert on the apps together with one on VMS).

An aside on quorum disk: that is an ugly trick to work around the fact that 2 cannot be decremented by 1, and leave more than half of the start value.
Three (nodes) or more is much superior, and a quorum disk in essence is just a (very passive) node supplying the 3rd vote. But: it IS the only solution if you have two active systems, and want to be able to continue if either fails.

I would suggest if you go the road to an integrated cluster to open a new thread on that topic.

hth

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
comarow
Trusted Contributor

Re: Managing a cluster with 2 nodes without quorum disk

Without a quorum disk or a quorum node, there is not simple solution that allows either node to stay up and boot by itself. It would be nice, but it simply can't be done.

Either you can give each node one vote, and
boot both nodes. If one node should not be working or unavailable, you could do a conversational boot and for that boot, set expected votes to 1 for one node and boot it.
Once the other node boots, it will re-adjust quorum.

A node crashing will hand your cluster.

One option is is one node is more important than the other. Give it a vote, expected votes to 1, and the other node 0. Once again, you can do a conversational boot.

If you lose a node, one option is to install the Availability Manager on a local PC. If you get a hung node, you can add a vote, BUT be careful. The other node must be dead, if note you must kill it.

Is there another node somewhere that you can give a deciding vote to?