Re: Managing a cluster with 2 nodes without quorum disk

Yves Kinnaer · ‎12-06-2006

Hi,

I have this issue with a cluster consisting of two nodes, no quorum disk (OpenVMS 7.1). Some of my former colleagues made some system adjustments and these are the actual values:
Node 1 (primary): 2 votes ; Node 2 ( secondary) : 1 vote. Sysparam expected_votes on both systems: 2. Cl_exp = 3 ; Cl_votes = 3. The application is running on node 1 (shadowcopying using a virtual DSA1...). What do I have to do to run the application on Node 2 ? If I stop the application on Node 1 & shutdown using remove_node , I'll expect the second one to hang (until I reboot Node 1) because expected_votes = 2. In some old documentation, I do find a description of the expected_votes with value 1 (!). (Although it is generaly recommended to set this parameter to the total of all possible votes in order to avoid cluster partitioning...). Can I change the expected_votes to 1 & do I use "set write sysparams 0" or not ?

Wim Van den Wyngaert · ‎12-06-2006

Using remove_node will NOT put the cluster in hang. Not using it will do that.

What you have to do with the application to run it on the other node depends on the application. Some might need reconfig or extra installations.

Just use a quorum disk if you can.

Wim

Wim

Jan van den Ende · ‎12-06-2006

Yves.

to begin with:

WELCOME to the VMS forum!

- with a total of 3 votes, it is HIGHLY ADVISABLE to set expected_votes to 4 (equal on all systems)
- the WHOLE PURPOSE of the Shutdown option REMOVE_NODE is, to lower ("recalculate") the value of quorum directly after tehe removal of a node (as part of the "state transition", so, using that option THE REMAINING CLUSTER WILL NOT HANG (even when down to one node).

Please, PLEASE do NOT fiddle with EXPECTED_VOTES, because THAT would be what allows a partiononed cluster (and this is one of the few points where I really prefer HPUX terminology: they call it a "split brain cluster).

As an aside: 7.1 has gone out of support quite some time ago. Any specific reason for not upgrading?

Second aside: "failover" is not the prefered way to run an app on VMS. Any reason for not running it clusterwide? (although good reasons DO exist: real-time app, memory-resident app, ported *UX dabase engine...). Just curious. :-)

Hth.

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Wim Van den Wyngaert · ‎12-06-2006

I must disagree with Jan.

Expected votes must be the sum of all votes of all voting members. In your case that is 3.

To have the majority during booting, node 1 can boot alone because 2 votes is a majority when 3 votes are in the game.
Node 2 can not boot on his own because 1 is a minority when 3 votes are in the game.

When you set expected_votes to 4, even node 1 will not have the majority to boot.

Again, use a quorum disk with e.g. the page files on.

Wim

Wim

Jan van den Ende · ‎12-06-2006

Wim,

thanks for your correction. I wondered what you meant, until I noticed my TYPO. OF COURSE EXPECTED_VOTES should be the sum of all votes in the normal, full configuration! Should teach me once again be more severe in proof-reading before posting!

Yves, sorry for trying to mis-guide you!

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Yves Kinnaer · ‎12-06-2006

Hi,

Thnx for the replies...
- Beats me why the application is not running clusterwide. I'm not really a OpenVMS-specialist (just supporting the application running on it...). In case of OpenVMS or hardware related issues, we always get in contact with an external 2nd line support firm. And that's maybey still the right thing to do :-)
Yes,it is a real-tim app (a WCS (Warehouse Control System) is running on it) and e.g. all operational data is mirrored (application disks are kept in sync using shadowcopying). So we should be able to switch between the nodes at any time.
- There is no shared non-shadowed disk directly accessible by both nodes so we're not able to use a quorum disk.
- Still wondering why I did find the "expected_votes=1"-value in some old documentation. Is there really a risk of a "split brain cluster" when the application is only running on one node ?
- Still 7.1: nobody (management,...) is really bothering about it. The cluster & application has been running steady for more than 10 years. Even our support contractor is not making any fuzz about it... And everybody is migrating to Windows-based app's so it's budgetary rather difficult to start talking about an upgrade.
- Besides this 2 node cluster, there are also 2 3-node clusters to deal with. There is a proposal to build a 5-node cluster to get rid off the risks related to the 2-node cluster (4 nodes & 1 quorum node). What do you think about that ?
- We're planning to make some system & application disk backups during the weekend. And we also want to test the switch over-procdure (stopping the application on node 1 & start up on the backup node). So again: is it really ou of the question to set the expected_votes to 1 ?

Yves

Wim Van den Wyngaert · ‎12-06-2006

If you set expected_votes to 1, systems can always boot on their own, causing a split cluster. You never do that in a cluster.

A 3 or 5 node cluster may solve your quorum problem. But isn't it simplier to to add a disk or to reorg so that 2 disks are freed ?

Wim

Wim

Richard Brodie_1 · ‎12-06-2006

'Still wondering why I did find the "expected_votes=1"-value in some old documentation. '

If your primary node failed hard, and your backup node went down then it would be reasonable to set expected votes to 1 on the backup node; you probably want to get your system running before the first node gets repaired. Once you have done that you then need to ensure not to boot the primary node outside the cluster. One of the risks is that, if you aren't extremely careful, you can propogate shadow copies the wrong way.

As for replacing the cluster with a 5 node one: it seems a bit like overkill, or at least you are going with a system with no redundancy to one with double redundancy. Three nodes with equal votes protects you from a single node failure. Sure, 3/5 is better but it if you are doing it for redundancy, rather than performance, the budget might be spent better elsewhere.

As for the expected_votes set to 2 or 3, both give a quorum of 2, so it makes no practical difference. Setting it to 3 would be good practice, though.

Jan van den Ende · ‎12-06-2006

Yves,

First: Indeed, expected_votes = 1 is A VERY BAD IDEA!

Second: consolidating into a single cluster in my view is the one step that
- improves your availability
- HUGHLY simplifies system management
- is very cost-effective.

But, if you are going to consolidate multiple systems/clusters into one cluster, be sure to do the thinking AHEAD of the implementing! Changing the chosen config afterwards is much harder than changing the planned setup. But only you ( = somebody that KNOWS the apps, and the various constraints) can do that (or, of course, one expert on the apps together with one on VMS).

An aside on quorum disk: that is an ugly trick to work around the fact that 2 cannot be decremented by 1, and leave more than half of the start value.
Three (nodes) or more is much superior, and a quorum disk in essence is just a (very passive) node supplying the 3rd vote. But: it IS the only solution if you have two active systems, and want to be able to continue if either fails.

I would suggest if you go the road to an integrated cluster to open a new thread on that topic.

hth

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

comarow · ‎12-06-2006

Without a quorum disk or a quorum node, there is not simple solution that allows either node to stay up and boot by itself. It would be nice, but it simply can't be done.

Either you can give each node one vote, and
boot both nodes. If one node should not be working or unavailable, you could do a conversational boot and for that boot, set expected votes to 1 for one node and boot it.
Once the other node boots, it will re-adjust quorum.

A node crashing will hand your cluster.

One option is is one node is more important than the other. Give it a vote, expected votes to 1, and the other node 0. Once again, you can do a conversational boot.

If you lose a node, one option is to install the Availability Manager on a local PC. If you get a hung node, you can add a vote, BUT be careful. The other node must be dead, if note you must kill it.

Is there another node somewhere that you can give a deciding vote to?

Steve-Thompson · ‎12-10-2006

Hello Yves
It would appear your concern is that of keeping running when the primary node fails.
As most people have said so far there's nothing wrong with the cluster parameters you have presented in your opening question.
They just make you do things one way and not another.
Here's what you can do if you dont want to add a quorum disk!
You'll have to judge for yourself the options open to you!

The most likely scenario is that Node-1 will stop for a hardware failure.
Node-2 will most likey hang, because Node-1 wasnt correctly removed from the cluster.

Should Node-1 crash, we can do 1 of 3 things:

1.
This I think is really theoretical, there probably wont be enough time to do this!
Quickly change Node-2's "VOTES" to 2 in order to keep going.
($MC sysgen set votes 2)

(You must also be VERY sure that Node-1 really has crashed, otherwise you'll end up with "a split-brain" running your app.)

- So, you have to do a on the console of Node-2 and restart the latter.

2.
Start Node-2 as Node-2.
After the you will see >>> on the console.
DO an INIT to clean up.
>>> init
(wait a minute for it to finish)
Then do a:
>>> boot -fl ,1
(root is typically a 0 or a 1. If you dont know this then previously do this :-
>>> sho b*
look for the parameter "boot_osflags". Eg lets say it's - 1,0 - then to restart Node-2 you will do this:
>>> boot -fl 1,1

This will get you to a SYSBOOT> prompt.

Here all you do is:
SYSBOOT> SHO VOTES (always better to see what going on first)
SYSBOOT> SET VOTES 2
SYSBOOT> C
(It means continue)

VMS will start...
I imagine you will have to start your app manually on this node as this is not the normal case.

Then what happens, field service has fixed Node-1.
So we do the same startup precedure for Node-1
EXCEPT...
Check the boot_osflag value, BUT this time we HAVE to put "SET VOTES 1" at the SYSBOOT> prompt, for the cluster to work correctly.

So we've effectively changed the VOTES value between the 2 nodes.

3.
Start Node-2 as Node-1.
This should work correctly assuming both nodes are SYMETRICAL!
If your nodes are different, ie quantity of ethernet cards, usage of local disks (this will cause problems), postition of the pagefiles, then you will have to go throught the startup procedure of the two nodes and start making adjustments.

This method will effectively change the ROOT of the 2 Nodes and we dont touch the votes, expected_votes etc.... nothing else!
So when physical Node-2 starts up it will introduce itself as Node-1 and the "app" should be running normally.

OK we've just crashed Node-2
Now the aim is to start Node-2 using Node-1's root
>>> sho b* (Again)
look for the same parameter boot_osflags, I assumed 1,0 as its "normal" value.
We also have to know previously, Node-1's root value - This will mose likely be 0,0.
So we do a:
>>> boot -fl 0,0

Here is not necessary to enter into SYSBOOT as our working hardware will pretend to be the failing node.

Again...
Field service has just fixed the failing hardware...
And from the console prompt of Node-1, we have to:
>>> boot -fl 1,0

Under the circumstances of a hardware failure, obviously you don't HAVE to start the SECOND node, that you can decide, I think the aim is to provide service to the users, that again is your criteria.

Hope that helps

Regards
Steven

Jan van den Ende · ‎12-11-2006

Yves (and Steven)

Bob Comarow already mentioned Availbilty Manager (or AMDS).
That really would be THE tool for it.

Steven, I am sorry, but your #1 DOES NOT work! If no other reason, then already just because VOTES is not dynamic..

OTOH, the effect you are seeking can be done with IPC:
at console go to interrupt mode (on older systems Ctrl-P or BREAK, on newer system check your HW manual)
>>> D SIRR C ! Deposit %XC in SIRR register
>>> C ! Continue (at IPL %XC)
IPC> Q ! Recalculate Quorum from present Votes
IPC> Ctrl-Z ! Leave interrupt mode

The TOTAL time available for this is RECNXINTERVAL

#2 just does a full boot for Node_2

#3 _CAN_ work, but only if
a) both nodes are pretty much equal in CPUs and memory
_AND_
b) the bootstrap procedure is such, that it takes into account any differences in periferals. Special attention please for the Network devices!

hth,

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Thomas Ritter · ‎12-11-2006

Yves,
Cluster partioning or the problem whereby nodes access shared disks without cooridination through the distributed lock manager is really only a concern when the interconnects are networks. Networks can be easily disturbed and it you run a wide area cluster quorum is critical. If you are running old type hardware with coax interconnects or CI then the only way cluster portioning is going to occur is by mucking around with the controllers or their cables. Even then the chances are remote. A node crash is mostly likely the result if the interconnects are disturbed. Even if a node tries to mount a disk already mounted by another node, the mount will probably fail with something like duplicate allocation. I know of no situation where cluster portioning has occurred under CI type configurations where the nodes are located in the same vicinity.

For cluster portioning to occur.
1) application has to run on two nodes
2) the data has to be mirrored or available to each node. Each node has its own copy.

Yves Kinnaer · ‎01-28-2007

Thnx for all the feedback & input !! We've contacted our support partner and decided to extend the cluster with a quorum node (a spare Alpha 300 4/266 server). Going for a 3 node cluster seems to be the most simple & reliable solution. Case closed...

Grtz,
Yves

Yves Kinnaer · ‎01-28-2007

We've contacted our support partner and decided to extend the cluster with a quorum node (a spare Alpha 300 4/266 server). Going for a 3 node cluster seems to be the most simple & reliable solution. Case closed...

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Managing a cluster with 2 nodes without quorum disk

Managing a cluster with 2 nodes without quorum disk