- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: 2 node cluster + quorum node - network breaks,...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 02:26 PM
05-31-2007 02:26 PM
There are 2 nodes (A & B), effectively primary and secondary, in isolated locations. A third node (C), the quorum node, is located in another location altogether.
Suppose the connection between A and B drops, but the connection to node C remains for both node A and B, what happens?
How do you prevent partitioning in this instance?
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 02:33 PM
05-31-2007 02:33 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
Thanks,
Mark.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 03:08 PM
05-31-2007 03:08 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 04:51 PM
05-31-2007 04:51 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
You cannot, and should not, decide in advance that a particular node will be chosen as the survivior of a comms failure. See Keith Parris's "creeping doom" scenario to understand why.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 05:49 PM
05-31-2007 05:49 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
As these two nodes are shadowed (A & B), and C is only a quorum, how does VMS determine that A or B should shut down if the network/link goes down between A & B?
I should have added, my apologies, that users could connect into either A or B. Perhaps obvious, perhaps not and are NOT affected by the network/link failure.
What expected_votes and vote values determine this?
Is it possible that there is a failure like this, that data could be lost in the period it takes to transition to CLUEXIT? That is, people connected to A, and people connected to B. Or is this question too vague? :-)
Thanks, Mark.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 08:12 PM
05-31-2007 08:12 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
>VMS determine that A or B should shut down
>if the network/link goes down between A & B?
Cluster rules require direct paths from each node to each other node. If a path is lost, both A and B will realise that it can't see the other one. Since both nodes can also see node C. Since each of the proposed clusters A+C and B+C have the same number of votes, there has to be a tie break, and the victim must CLUEXIT. Thems the rules.
>What expected_votes and vote values determine this?
The only sensible voting scheme for this topology is each node 1 vote. EXPECTED_VOTES is therefore 3.
You could do A=2, B=1, C=1, and therefore EXPECTED_VOTES=4, then if the A-B link breaks, B would always get kicked out because the A+C cluster has mode votes. HOWEVER, with that distribution, if you lose A, the B+C cluster is not viable, AND you're exposed to the risk of a creeping doom failure at site A.
>data could be lost
Yes incomplete transactions in flight on the "losing" node may be lost, BUT the cluster state transition logic will ensure you won't have any data corruption.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 08:19 PM
05-31-2007 08:19 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
http://www.geocities.com/keithparris/
particularly
Understanding Vaxcluster state transitions
Disk partitioning on Openvms
VmsCluster State Transitions in action
Using OpenvmsClusters for disaster tolerance
There is a lot of excellent material, I am sure I have forgotten some other documents.
Have a good reading.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 08:24 PM
05-31-2007 08:24 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
http://www2.openvms.org/kparris/
see "Openvms Connection Manager and the quorum scheme"
which has an example of broken Cluster.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 09:35 PM
05-31-2007 09:35 PM
Solution>>>
The only sensible voting scheme for this topology is each node 1 vote. EXPECTED_VOTES is therefore 3.
<<<
Note if if you DO want an unbalanced config in which any 2 nodes can continue, then
3 + 2 + 2 with EXPECTED_VOTES=7
_IS_ a valid option.
I do not know however, if this would influence the decision of WHICH node should CLUEEXIT.
So, if the choice of the remaining cluster has either 5 or 4 votes, is the 5-vote config favored?
I am quite curious myself!
fwiw
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 10:30 PM
05-31-2007 10:30 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
>At http://www2.openvms.org/kparris/
>
>see "Openvms Connection Manager and the quorum >scheme"
>which has an example of broken Cluster.
Yes, I read this, and as he states, "systems in the minority voluntarily suspend processing while systems in the majority can continue to process transactions".
The issue is that both A+C and B+C are voting equals.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 10:47 PM
05-31-2007 10:47 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
Thanks for your continuing help.
Mark,
>Since each of the proposed clusters A+C and >B+C have the same number of votes, there has >to be a tie break, and the victim must >CLUEXIT. Thems the rules.
Ok, I understand that now. I was getting myself worried that there would be some partitioning into 2 clusters. Ack!
>>What expected_votes and vote values >>determine this?
>The only sensible voting scheme for this >topology is each node 1 vote. EXPECTED_VOTES >is therefore 3.
So, equal votes for A+C or B+C and then let the "system" decide which one bites the dust? Ok.
>You could do A=2, B=1, C=1, and therefore >EXPECTED_VOTES=4, then if the A-B link >breaks, B would always get kicked out >because the A+C cluster has mode votes. >HOWEVER, with that distribution, if you lose >A, the B+C cluster is not viable, AND you're >exposed to the risk of a creeping doom >failure at site A.
Forgive me, why is it not viable? Does it not have 2 votes (half of 4 expected)? Or are you saying because forcing the system to make B+C the cluster could cause issues if A is still alive and chatting to C?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 10:59 PM
05-31-2007 10:59 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
Quorum is Expectedvotes/2+1, so 4 -> 3.
regards Kalle
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 11:00 PM
05-31-2007 11:00 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
> You should have a look at Keith Parris page, at http://www.geocities.com/keithparris/
Thank you, it is a very good site for information. Some of it, alas, is a bit over my head.
Merci.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 11:02 PM
05-31-2007 11:02 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
>Quorum is Expectedvotes/2+1, so 4 -> 3.
Oops, forgive me. I forgot.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2007 11:07 PM
05-31-2007 11:07 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
>Note if if you DO want an unbalanced config in which any 2 nodes can continue, then
3 + 2 + 2 with EXPECTED_VOTES=7
_IS_ a valid option.
Actually this is probably preferable if indeed it can affect the decision of which node to CLUEXIT. I would prefer A+C to up over B+C, but again, does this not lead to the "creeping doom scenario"?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2007 01:57 AM
06-01-2007 01:57 AM
Re: 2 node cluster + quorum node - network breaks, what happens?
actually, no; unless you set the AutoStart Action to BOOT.
If A-B connectivity breaks, then either A or B clueexits.
If on the remaining cluster (say, A + C) something ugly comes to pass, then the cluster looses another node, and goes into quotum hang.
Only to be released by human intervention. (which HAD BETTER be sufficiently instructed)
Should AUTOSTART be set to BOOT, then (say) B will try to reboot. As long as A and C see one another, while B sees C and not A, B will not be allowed to join.
In the creeping doom situation, maybe C will eventually loose connectivity to A, in which case B + C is valid. And NOW you loose all that was done on A...
Then again, if the entire A site is destroyed, so is the data at A.
_IF_ in this scenario the data is on 3-member shadow sets however, first, the members at B are disconnected. Then A goes, as do the A shadow members. So, B joins. But the ACTIVE shadow set is still at C (allbeit single member) and the DATA survives.
-- Incidentally that is why we like to be able to have two members at each site ( + one to be split off for Backup puprposes, ie, allow at least 7 members in a set... as requested again during Bootcamp.
hth.
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2007 04:20 AM
06-01-2007 04:20 AM
Re: 2 node cluster + quorum node - network breaks, what happens?
Ian.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2007 05:17 AM
06-01-2007 05:17 AM
Re: 2 node cluster + quorum node - network breaks, what happens?
No routing.
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2007 08:56 AM
06-01-2007 08:56 AM
Re: 2 node cluster + quorum node - network breaks, what happens?
One of the options some sites use is (for instance) FC over IP, and (in parallel) SCS over bridged LANs.
Setting Votes and Expected_Votes is in the OpenVMS FAQ (available via <>), and a write-up on the topic is also available at <>. If you do not understand the parameters and the quorum scheme based on what you've read here already and what's in the FAQ, you really need to take the time to understand it. Or to ask questions on what you don't understand.
The Quorum scheme is THE basis for clustering. Creative or incorrect settings for the values here can seriously scrozzle your data.
Stephen Hoffman
HoffmanLabs LLC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2007 12:09 PM
06-01-2007 12:09 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
You can also add an extra pair of network adaptors and use a cross over cable between nodes A and B to maintain cluster communications. Assuming you have both systems configured for speed/duplex or auto-negotiate, your cluster traffice will use the new path without configuration. This would keep up the primary and secondary system in the event the WAN connection drops.
"There are nine and sixty ways of contructing . . . and every single one of them is right."
Andy Bustamante
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2007 02:07 PM
06-01-2007 02:07 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
>Setting Votes and Expected_Votes is in the OpenVMS FAQ (available via <>), and a write-up on the topic is also available at <>. If you do not understand the parameters and the quorum scheme based on what you've read here already and what's in the FAQ, you really need to take the time to understand it. Or to ask questions on what you don't understand.
The understanding was more in the processes VMS uses to determine which of the potentially partitioned clusters is to be hanged. Then, if this is known, to somehow predict the outcome. Some, such as Keith Parris (and Martin), proffer SCSYSTEMID as a possible determining factor but then intimate it's random.
Some jiggling of votes will only seem to guarantee the recovery of, say, A+C over B+C. Remember I was asking the question on the basis of the link between A and B failing, not the nodes themselves. ( I think VMS is much more stable than the network it exists on)
Also of concern was then the timing factors, and how much data loss is involved in A+C becoming the cluster while people are still connected to B+C. This is probably indeterminate, and largely at the behest of the underlying database (RDB)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2007 03:13 PM
06-01-2007 03:13 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
Can you explain what you are trying to achieve? providing a bit of background information might help you get the answers you really need.
>> I would prefer A+C to up over B+C <<
Why? is node A more powerful?.
How does your application react to node failure? How are users impacted?.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2007 08:18 PM
06-01-2007 08:18 PM
Re: 2 node cluster + quorum node - network breaks, what happens?
>Can you explain what you are trying to >achieve? providing a bit of background >information might help you get the answers you >really need.
I was specifically looking at the disaster tolerance of a 3 node cluster where one node is nothing but a quorum voter. With A and B being the primary servers remotely isolated from each other, how was a drop in the network to specifically affect the way the cluster reverts to (A and C) or (B and C).
Also be mindful that I have 3 servers at my disposal, no more, no less. I cannot add 1 at each site for further redundancy and so on. Likewise, the network is as it is and I have to work with it. That is, there is a dedicated link between A and B (and if need be, between A and C and B and C)
I am aware that this 3 node set-up is the "desired" way, but I was not so sure it was the best for stopping data loss.
A primary server (A) shadowing to a secondary (B) where loss of connection happens, is a trivial amount of data lost (if at all). My question revolved around the 3 node cluster and how much data would be lost.
Of course, this setup has its own draw-backs, ie, one down all down.
>>> I would prefer A+C to up over B+C <<
>Why? is node A more powerful?.
Much. In the order of 25% better. Just enough should the full load shift to the "secondary" server after a failure in the link.
I would prefer A+C to be running after a link failure, but it seems this cannot be effected without causing other issues.
>How does your application react to node >failure? How are users impacted?.
The application is fine, the database is RDB. The impact to the users is unknown, that was part of the reason for the question.
As a solution, a quorum server seems a viable alternative to primary & secondary providing there is not too much data loss.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-02-2007 01:24 AM
06-02-2007 01:24 AM
Re: 2 node cluster + quorum node - network breaks, what happens?
What amount of data are we talking?
Disk space is pretty cheap these days, so the setup where you have your DATA triplicated (shadowed) over 3 sites should be affordable, if node C has the slot capacity.
A compromise might be to have your system control files (SYSUAF, RIGHTSLIST, QUEfiles, etc) and the programss stuff, and whatever more, shadowed only between A and B, and have only the important application DATA (or even only the subject-to-change part of that) on all 3 sites. Just decide what data updates you can afford to loose, and what insurance you are prepared to pay to prevent
such loss.
Only you(r company) can make that decision, and you are the one that has to do the thinking-ahead, but we CAN give you the ideas to play with, and the arguments that can lead to an informed decision.
hth
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-02-2007 01:35 AM
06-02-2007 01:35 AM
Re: 2 node cluster + quorum node - network breaks, what happens?
Not on which node is a "quorum node". For the purposes of the quorum scheme itself, there's no specific concept of a "quorum node", it's just another voting node. The concept is helpful in some cases, but it appears to be confusing things here.
(And "gaming" the expected_votes value is not recommended, as a "gamed" value is detected and automatically corrected once connections are established, and the "gamed" value can permit partitioning and data data corruptions when connections are only partially established; during degraded operations and during bootstraps.)
If Node A can (for instance) contact Node B but not C, and if B and C are connected, then A will halt processing until the timers fire and then drop out. If B and C have enough votes, they'll continue processing after the cluster transition.
You can choose to hold cluster and application processing on one node through local configuration steps and through the use of parameters including LOCKDIRWT and the more recent LOCKRMWT. This is separate from the quorum scheme, and from which node survives -- that's determined by where the votes are, and where the link failures are.
Another option for uptime with Rdb is RTR. RTR can operate and distribute transactions outside of a cluster configuration.
And the other approach is -- and you don't indicate which sorts of systems are involved here -- throwing faster hardware at the problem can be a solution. Having been retired, new-old-stock AlphaServer boxes and used AlphaServer stock are available at comparatively low cost.
Again, the quorum scheme is simple and very elegant. It's votes and expected_votes (which beget the calculated quorum) and the connections. That's basically it.
Stephen Hoffman
HoffmanLabs.com