- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Managing a cluster with 2 nodes without quorum...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2006 03:09 AM
12-06-2006 03:09 AM
Managing a cluster with 2 nodes without quorum disk
I have this issue with a cluster consisting of two nodes, no quorum disk (OpenVMS 7.1). Some of my former colleagues made some system adjustments and these are the actual values:
Node 1 (primary): 2 votes ; Node 2 ( secondary) : 1 vote. Sysparam expected_votes on both systems: 2. Cl_exp = 3 ; Cl_votes = 3. The application is running on node 1 (shadowcopying using a virtual DSA1...). What do I have to do to run the application on Node 2 ? If I stop the application on Node 1 & shutdown using remove_node , I'll expect the second one to hang (until I reboot Node 1) because expected_votes = 2. In some old documentation, I do find a description of the expected_votes with value 1 (!). (Although it is generaly recommended to set this parameter to the total of all possible votes in order to avoid cluster partitioning...). Can I change the expected_votes to 1 & do I use "set write sysparams 0" or not ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2006 03:17 AM
12-06-2006 03:17 AM
Re: Managing a cluster with 2 nodes without quorum disk
What you have to do with the application to run it on the other node depends on the application. Some might need reconfig or extra installations.
Just use a quorum disk if you can.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2006 03:27 AM
12-06-2006 03:27 AM
Re: Managing a cluster with 2 nodes without quorum disk
to begin with:
WELCOME to the VMS forum!
- with a total of 3 votes, it is HIGHLY ADVISABLE to set expected_votes to 4 (equal on all systems)
- the WHOLE PURPOSE of the Shutdown option REMOVE_NODE is, to lower ("recalculate") the value of quorum directly after tehe removal of a node (as part of the "state transition", so, using that option THE REMAINING CLUSTER WILL NOT HANG (even when down to one node).
Please, PLEASE do NOT fiddle with EXPECTED_VOTES, because THAT would be what allows a partiononed cluster (and this is one of the few points where I really prefer HPUX terminology: they call it a "split brain cluster).
As an aside: 7.1 has gone out of support quite some time ago. Any specific reason for not upgrading?
Second aside: "failover" is not the prefered way to run an app on VMS. Any reason for not running it clusterwide? (although good reasons DO exist: real-time app, memory-resident app, ported *UX dabase engine...). Just curious. :-)
Hth.
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2006 03:55 AM
12-06-2006 03:55 AM
Re: Managing a cluster with 2 nodes without quorum disk
Expected votes must be the sum of all votes of all voting members. In your case that is 3.
To have the majority during booting, node 1 can boot alone because 2 votes is a majority when 3 votes are in the game.
Node 2 can not boot on his own because 1 is a minority when 3 votes are in the game.
When you set expected_votes to 4, even node 1 will not have the majority to boot.
Again, use a quorum disk with e.g. the page files on.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2006 04:09 AM
12-06-2006 04:09 AM
Re: Managing a cluster with 2 nodes without quorum disk
thanks for your correction. I wondered what you meant, until I noticed my TYPO. OF COURSE EXPECTED_VOTES should be the sum of all votes in the normal, full configuration! Should teach me once again be more severe in proof-reading before posting!
Yves, sorry for trying to mis-guide you!
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2006 07:23 PM
12-06-2006 07:23 PM
Re: Managing a cluster with 2 nodes without quorum disk
Thnx for the replies...
- Beats me why the application is not running clusterwide. I'm not really a OpenVMS-specialist (just supporting the application running on it...). In case of OpenVMS or hardware related issues, we always get in contact with an external 2nd line support firm. And that's maybey still the right thing to do :-)
Yes,it is a real-tim app (a WCS (Warehouse Control System) is running on it) and e.g. all operational data is mirrored (application disks are kept in sync using shadowcopying). So we should be able to switch between the nodes at any time.
- There is no shared non-shadowed disk directly accessible by both nodes so we're not able to use a quorum disk.
- Still wondering why I did find the "expected_votes=1"-value in some old documentation. Is there really a risk of a "split brain cluster" when the application is only running on one node ?
- Still 7.1: nobody (management,...) is really bothering about it. The cluster & application has been running steady for more than 10 years. Even our support contractor is not making any fuzz about it... And everybody is migrating to Windows-based app's so it's budgetary rather difficult to start talking about an upgrade.
- Besides this 2 node cluster, there are also 2 3-node clusters to deal with. There is a proposal to build a 5-node cluster to get rid off the risks related to the 2-node cluster (4 nodes & 1 quorum node). What do you think about that ?
- We're planning to make some system & application disk backups during the weekend. And we also want to test the switch over-procdure (stopping the application on node 1 & start up on the backup node). So again: is it really ou of the question to set the expected_votes to 1 ?
Yves
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2006 07:48 PM
12-06-2006 07:48 PM
Re: Managing a cluster with 2 nodes without quorum disk
A 3 or 5 node cluster may solve your quorum problem. But isn't it simplier to to add a disk or to reorg so that 2 disks are freed ?
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2006 10:27 PM
12-06-2006 10:27 PM
Re: Managing a cluster with 2 nodes without quorum disk
If your primary node failed hard, and your backup node went down then it would be reasonable to set expected votes to 1 on the backup node; you probably want to get your system running before the first node gets repaired. Once you have done that you then need to ensure not to boot the primary node outside the cluster. One of the risks is that, if you aren't extremely careful, you can propogate shadow copies the wrong way.
As for replacing the cluster with a 5 node one: it seems a bit like overkill, or at least you are going with a system with no redundancy to one with double redundancy. Three nodes with equal votes protects you from a single node failure. Sure, 3/5 is better but it if you are doing it for redundancy, rather than performance, the budget might be spent better elsewhere.
As for the expected_votes set to 2 or 3, both give a quorum of 2, so it makes no practical difference. Setting it to 3 would be good practice, though.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2006 10:40 PM
12-06-2006 10:40 PM
Re: Managing a cluster with 2 nodes without quorum disk
First: Indeed, expected_votes = 1 is A VERY BAD IDEA!
Second: consolidating into a single cluster in my view is the one step that
- improves your availability
- HUGHLY simplifies system management
- is very cost-effective.
But, if you are going to consolidate multiple systems/clusters into one cluster, be sure to do the thinking AHEAD of the implementing! Changing the chosen config afterwards is much harder than changing the planned setup. But only you ( = somebody that KNOWS the apps, and the various constraints) can do that (or, of course, one expert on the apps together with one on VMS).
An aside on quorum disk: that is an ugly trick to work around the fact that 2 cannot be decremented by 1, and leave more than half of the start value.
Three (nodes) or more is much superior, and a quorum disk in essence is just a (very passive) node supplying the 3rd vote. But: it IS the only solution if you have two active systems, and want to be able to continue if either fails.
I would suggest if you go the road to an integrated cluster to open a new thread on that topic.
hth
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2006 11:21 PM
12-06-2006 11:21 PM
Re: Managing a cluster with 2 nodes without quorum disk
Either you can give each node one vote, and
boot both nodes. If one node should not be working or unavailable, you could do a conversational boot and for that boot, set expected votes to 1 for one node and boot it.
Once the other node boots, it will re-adjust quorum.
A node crashing will hand your cluster.
One option is is one node is more important than the other. Give it a vote, expected votes to 1, and the other node 0. Once again, you can do a conversational boot.
If you lose a node, one option is to install the Availability Manager on a local PC. If you get a hung node, you can add a vote, BUT be careful. The other node must be dead, if note you must kill it.
Is there another node somewhere that you can give a deciding vote to?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2006 10:52 PM
12-10-2006 10:52 PM
Re: Managing a cluster with 2 nodes without quorum disk
It would appear your concern is that of keeping running when the primary node fails.
As most people have said so far there's nothing wrong with the cluster parameters you have presented in your opening question.
They just make you do things one way and not another.
Here's what you can do if you dont want to add a quorum disk!
You'll have to judge for yourself the options open to you!
The most likely scenario is that Node-1 will stop for a hardware failure.
Node-2 will most likey hang, because Node-1 wasnt correctly removed from the cluster.
Should Node-1 crash, we can do 1 of 3 things:
1.
This I think is really theoretical, there probably wont be enough time to do this!
Quickly change Node-2's "VOTES" to 2 in order to keep going.
($MC sysgen set votes 2)
(You must also be VERY sure that Node-1 really has crashed, otherwise you'll end up with "a split-brain" running your app.)
- So, you have to do a
2.
Start Node-2 as Node-2.
After the
DO an INIT to clean up.
>>> init
(wait a minute for it to finish)
Then do a:
>>> boot -fl
(root is typically a 0 or a 1. If you dont know this then previously do this :-
>>> sho b*
look for the parameter "boot_osflags". Eg lets say it's - 1,0 - then to restart Node-2 you will do this:
>>> boot -fl 1,1
This will get you to a SYSBOOT> prompt.
Here all you do is:
SYSBOOT> SHO VOTES (always better to see what going on first)
SYSBOOT> SET VOTES 2
SYSBOOT> C
(It means continue)
VMS will start...
I imagine you will have to start your app manually on this node as this is not the normal case.
Then what happens, field service has fixed Node-1.
So we do the same startup precedure for Node-1
EXCEPT...
Check the boot_osflag value, BUT this time we HAVE to put "SET VOTES 1" at the SYSBOOT> prompt, for the cluster to work correctly.
So we've effectively changed the VOTES value between the 2 nodes.
3.
Start Node-2 as Node-1.
This should work correctly assuming both nodes are SYMETRICAL!
If your nodes are different, ie quantity of ethernet cards, usage of local disks (this will cause problems), postition of the pagefiles, then you will have to go throught the startup procedure of the two nodes and start making adjustments.
This method will effectively change the ROOT of the 2 Nodes and we dont touch the votes, expected_votes etc.... nothing else!
So when physical Node-2 starts up it will introduce itself as Node-1 and the "app" should be running normally.
OK we've just crashed Node-2
Now the aim is to start Node-2 using Node-1's root
>>> sho b* (Again)
look for the same parameter boot_osflags, I assumed 1,0 as its "normal" value.
We also have to know previously, Node-1's root value - This will mose likely be 0,0.
So we do a:
>>> boot -fl 0,0
Here is not necessary to enter into SYSBOOT as our working hardware will pretend to be the failing node.
Again...
Field service has just fixed the failing hardware...
And from the console prompt of Node-1, we have to:
>>> boot -fl 1,0
Under the circumstances of a hardware failure, obviously you don't HAVE to start the SECOND node, that you can decide, I think the aim is to provide service to the users, that again is your criteria.
Hope that helps
Regards
Steven
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-11-2006 08:27 AM
12-11-2006 08:27 AM
Re: Managing a cluster with 2 nodes without quorum disk
Bob Comarow already mentioned Availbilty Manager (or AMDS).
That really would be THE tool for it.
Steven, I am sorry, but your #1 DOES NOT work! If no other reason, then already just because VOTES is not dynamic..
OTOH, the effect you are seeking can be done with IPC:
at console go to interrupt mode (on older systems Ctrl-P or BREAK, on newer system check your HW manual)
>>> D SIRR C ! Deposit %XC in SIRR register
>>> C ! Continue (at IPL %XC)
IPC> Q ! Recalculate Quorum from present Votes
IPC> Ctrl-Z ! Leave interrupt mode
The TOTAL time available for this is RECNXINTERVAL
#2 just does a full boot for Node_2
#3 _CAN_ work, but only if
a) both nodes are pretty much equal in CPUs and memory
_AND_
b) the bootstrap procedure is such, that it takes into account any differences in periferals. Special attention please for the Network devices!
hth,
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-11-2006 11:23 AM
12-11-2006 11:23 AM
Re: Managing a cluster with 2 nodes without quorum disk
Cluster partioning or the problem whereby nodes access shared disks without cooridination through the distributed lock manager is really only a concern when the interconnects are networks. Networks can be easily disturbed and it you run a wide area cluster quorum is critical. If you are running old type hardware with coax interconnects or CI then the only way cluster portioning is going to occur is by mucking around with the controllers or their cables. Even then the chances are remote. A node crash is mostly likely the result if the interconnects are disturbed. Even if a node tries to mount a disk already mounted by another node, the mount will probably fail with something like duplicate allocation. I know of no situation where cluster portioning has occurred under CI type configurations where the nodes are located in the same vicinity.
For cluster portioning to occur.
1) application has to run on two nodes
2) the data has to be mirrored or available to each node. Each node has its own copy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-28-2007 11:10 PM
01-28-2007 11:10 PM
Re: Managing a cluster with 2 nodes without quorum disk
Grtz,
Yves
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-28-2007 11:13 PM
01-28-2007 11:13 PM