- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Quorum mismatch between cluster members
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-16-2010 07:48 AM
тАО06-16-2010 07:48 AM
I have a situation I can't explain and welcome your expert advice. I have a 4 node cluster running 7.3-2, all with individual system disks. There is no quorum disk. One of the members is reporting quorum as 3 and expected as 4. The other three are showing quorum as 2 and expected as 3.
View of Cluster from system ID 41162 node: GHA1 16-JUN-2010 10:40:26
+-----------------------------------------------------------+
| SYSTEMS | MEMBERS |
|-----------------------+-----------------------------------|
| NODE | SOFTWARE | VOTES | EXPECT | QUORUM | STATUS |
|--------+--------------+-------+--------+--------+---------|
| GHA1 | VMS V7.3-2 | 1 | 4 | 3 | MEMBER |
| GHA2 | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
| GHA3 | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
| GHA4 | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
+-----------------------------------------------------------+
View of Cluster from system ID 41164 node: GHA2 16-JUN-2010 10:41:04
+-----------------------------------------------------------+
| SYSTEMS | MEMBERS |
|-----------------------+-----------------------------------|
| NODE | SOFTWARE | VOTES | EXPECT | QUORUM | STATUS |
|--------+--------------+-------+--------+--------+---------|
| GHA2 | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
| GHA3 | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
| GHA1 | VMS V7.3-2 | 1 | 4 | 3 | MEMBER |
| GHA4 | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
+-----------------------------------------------------------+
Display of parameters from all of the nodes is identical and looks correct.
GHA1 $ mc sysgen show expect
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
EXPECTED_VOTES 4 1 1 127 Votes
GHA1 $ mc sysgen show votes
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VOTES 1 1 0 127 Votes
I used the set cluster command to put them all back in sync but even though it looks like the command was accepted the show cluster display doesn't change.
GHA1 $ set cluster /expected
%SET-I-EXPTD_VOTES, new value of expected votes is 4, yields quorum of 3
GHA2 $ set cluster /expected=4
%SET-I-EXPTD_VOTES, new value of expected votes is 4, yields quorum of 3
I am taking one of the nodes down this weekend and don't want any surprises.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-16-2010 08:26 AM
тАО06-16-2010 08:26 AM
Re: Quorum mismatch between cluster members
In sync with what? If the members have votes
of 4, 3, 3, 3, then how can expected votes be
only 4 (where one single guy (4) or any two
others (3+3) would have the expected votes?
The whole idea of these votes and a quorum is
to prevent multiple subsets happily forming
multiple clusters. I don't see how a value
of 4 expected votes can do that here.
> [...] don't want any surprises.
What would be unsurprising? How do you want
this stuff to work when not everyone is up?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-16-2010 08:42 AM
тАО06-16-2010 08:42 AM
Re: Quorum mismatch between cluster members
The ones shown against the individual nodes should just reflect their own SYSGEN parameters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-16-2010 08:43 AM
тАО06-16-2010 08:43 AM
Re: Quorum mismatch between cluster members
>>>
GHA1 $ mc sysgen show expect
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
EXPECTED_VOTES 4 1 1 127 Votes
<<<
Well, this at least agrees with
>>>
| GHA1 | VMS V7.3-2 | 1 | 4 | 3 | MEMBER |
<<<
And if you do
$ mc sysgen show expect
at the other 3 nodes, I expect to see
| GHAn | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
i.e.,
the other 3 nodes have EXPECTED_VOTES = 3
... and that implies the potentia for a partitioned cluster!!!!
For your own good: please check, and correct ASAP!
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-16-2010 08:44 AM
тАО06-16-2010 08:44 AM
Re: Quorum mismatch between cluster members
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-16-2010 09:22 AM
тАО06-16-2010 09:22 AM
Re: Quorum mismatch between cluster members
Ah. I was misled by comparing with my own
(strange) cluster, which has one voting node,
and a bunch with no votes. Thanks for the
info.
Never mind.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-16-2010 09:30 AM
тАО06-16-2010 09:30 AM
Re: Quorum mismatch between cluster members
Here? Just fix the votes and expected_votes settings in MODPARAMS.DAT and (if you're not maintaining AUTOGEN, which is itself a problem), and move on. Do grok the quorum scheme, but don't try to fool the quorum scheme. (Upon managing to successfully fool the quorum scheme, Bad Things tend to quickly happen to the disk data.)
If you have a shared interconnect here, you'll likely want to get a quorum disk configured here. This particularly if you have hardware RAID on that shared interconnect. Presuming shared hardware RAID and one-host survival, that'd typically have one vote for each host, three votes for the quorum disk, and expected_votes of (duh) seven. Presuming shared hardware RAID and three-host survival, one vote per host and one vote for the quorum disk, and expected_votes of (duh) five.
The VMS system management User Interface (UI) here has some serious shortcomings; it leaves folks rather confused about how all this works and (though this area has most definitely seen improvements in this regard) the diagnostics and automatic guidance have been lacking.
http://labs.hoffmanlabs.com/node/153
http://labs.hoffmanlabs.com/node/569
http://labs.hoffmanlabs.com/node/105
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-16-2010 02:14 PM
тАО06-16-2010 02:14 PM
Re: Quorum mismatch between cluster members
Isolate common parameters into a CLUSTER_MODPARAMS.DAT file on your cluster common disk and include it into each node specific MODPARAMS.DAT. This should eliminate inconsistencies like the one reported here.
(of course cluster software SHOULD detect and warn about such inconsistencies, but HP has resisted all efforts to fix the issues surrounding cluster misconfigurations, instead expecting each system manager to personally experience all the common pitfalls and reinvent the wheel fixing them)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-16-2010 05:08 PM
тАО06-16-2010 05:08 PM
Re: Quorum mismatch between cluster members
I suspect that your problem came about when adding node GHA1 to the cluster. That machine has appropriate settings for a 4 node cluster and the other 3 have appropriate settings for a 3 node cluster.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-16-2010 07:34 PM
тАО06-16-2010 07:34 PM
Re: Quorum mismatch between cluster members
>> I am taking one of the nodes down this weekend and don't want any
>> surprises.
You should consider yourself extremely lucky if you are not in for surprises
much before that!.
Each node has a Votes of 1. This is ok but then the expected votes setting is
incorrect. Threea of the nodes have expected votes of 3 while one of them have
a expected votes of 4. This can lead to a partitioned cluster.
As already indicated by Richard, you need to inclue the values of the CL_EXP,
CL_QUORUM and CL_VOTES in the "$SHOW CLUSTER" output to get a clear
picture of the cluster setup.
You need to relook at the Votes/ExpectedVotes settings of various nodes in the
cluster. The Links provided by Hoff should help you in this regard.
Regards,
Murali
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 04:26 AM
тАО06-17-2010 04:26 AM
Re: Quorum mismatch between cluster members
I also concur with Hoff and JohnG, particularly John's comments about AGEN$INCLUDE.
The safest way to ensure consistency is to have include a single file by reference. Then, all changes are a question of changing the common file and running the build (e.g., AUTOGEN).
This is why programmers use include files and system management is no different. Having two independent copies of something is a recipe for "evolution" (unintended differential changes) to occur.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 04:51 AM
тАО06-17-2010 04:51 AM
Re: Quorum mismatch between cluster members
>>>
Display of parameters from all of the nodes is identical and looks correct.
<<<
SYSMAN> do mc sysgen show votes
%SYSMAN-I-OUTPUT, command execution on node GHA3
VOTES 1 1 0 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA2
VOTES 1 1 0 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA4
VOTES 1 1 0 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA1
VOTES 1 1 0 127 Votes
SYSMAN> do mc sysgen show expected
%SYSMAN-I-OUTPUT, command execution on node GHA3
EXPECTED_VOTES 4 1 1 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA2
EXPECTED_VOTES 4 1 1 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA4
EXPECTED_VOTES 4 1 1 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA1
EXPECTED_VOTES 4 1 1 127 Votes
Richard, Thanks for pointing out the CL_fields on the show cluster display.
CL_EXP = 1
CL_QUORUM = 3
CL_VOTES = 4
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 04:55 AM
тАО06-17-2010 04:55 AM
Re: Quorum mismatch between cluster members
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 05:51 AM
тАО06-17-2010 05:51 AM
Re: Quorum mismatch between cluster members
"It is possible for this field to display a number smaller than the EXPECTED_VOTES parameter setting if the REMOVE_NODE option was used to shut down a cluster member or the SET CLUSTER/EXPECTED_VOTES DCL command was used since this node was last rebooted."
Well, I never knew that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 08:09 AM
тАО06-17-2010 08:09 AM
Re: Quorum mismatch between cluster members
-GHA1 had a controlled re-boot in May.
-SHOW CLUSTER may show confusing values under certain circumstances as noted by Richard. I'll be sure to use the CLU_ fields in the future.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 09:30 AM
тАО06-17-2010 09:30 AM
Re: Quorum mismatch between cluster members
You've found a cluster configuration error.
That's bad.
Go fix the values for your next reboot.
Your configuration can start processing with two votes and thus two nodes, and it should only start processing with three present.
Two two-node subclusters with shared storage would be Very Bad.
That sole four-vote node is likely going to prevent partitioning, but if somebody manages to get all four nodes set to three votes expected or if somebody has been using SET CLUSTER /EXPECTED to lower the required values, then all bets are off.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 10:18 AM
тАО06-17-2010 10:18 AM
Re: Quorum mismatch between cluster members
Which 4 vote node? I have not the slightest idea of what you are trying to tell me to fix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 11:04 AM
тАО06-17-2010 11:04 AM
Re: Quorum mismatch between cluster members
GHA1 | VMS V7.3-2 | 1 | 4 | 3 | MEMBER
Which lists four votes as expected.
If you should happen to get two nodes booted of your four votes and should those two not get connected to the other hosts (and thus get the calculated quorum values corrected), and should each of your nodes have expected_votes set to 3, then you can have two partitions within your cluster, and your shared data is corrupt.
Right now, that GHA1 box (which looks to have a correct setting for expected_votes) is the reason your cluster can't get into this partitioning case. That one setting prevents partitioning.
Quorum is the mechanism by which massive corruptions to your disks and other shared data are avoided. Failure to have a correct configuration can lead to data corruptions; your disks can end up completely corrupted long before you get to the login prompt.
I've seen a few of these partitioned configurations arise over the years, such as when somebody rebooted off the wrong system root, such as after a console battery failed. It's a mess.
Again, you do need to grok this stuff. Not look at the CL running settings. If the system parameters aren't set right, the corrections might not happen for cases where connectivity isn't fully available (for any of various reasons), and your data goes bye-bye.
Do you need to correct this Right Now? No. You have connectivity, and so long as the SET CLUSTER /EXPECTED isn't (mis)used, your quorum has been corrected to 3. Just get this fixed for your next reboot.
And consider getting that quorum disk going here, too, as that means you can survive the loss of more than one box in the cluster without a cluster quorum hang.
Please read what I linked to earlier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-18-2010 12:16 AM
тАО06-18-2010 12:16 AM
SolutionThe output of "SYSMAN> do mc sysgen show" that you provided in your post from Jun 17, 2010 12:51:30 GMT looks convincing to me.
I disagree with Hoff, I don't see any evidence that the sysgen cluster parameters are misconfigured. The show cluster display does seem to be misleading, especially since you used set cluster/expected=4. This reporting may have been fixed since 7.3-2 (I can't reproduce the node specific displayed values changing in 8.3 with set cluster/expected (either changing down or back up) or set cluster/quorum. See more about this below).
Since neither votes nor expected_votes are dynamic parameters, we can assume that all nodes were booted with the correct value of expected votes. This is because, by default, sysgen shows the active parameters. All four of your nodes reported expected_votes 4. As long as you didn't change the CURRENT parameters, you should be fine when you reboot.
Most likely cause of this condition (in my opinion) is that when GHA1 had a controlled re-boot in May, the REMOVE_NODE shutdown option was used, and the remaining nodes adjusted quorum down. When GHA1 rebooted, it used EXPECTED_VOTES = 4 from the current parameter file and its show cluster display is consistent with the values in effect at the time it joined the cluster. The CL_EXP value will always ratchet up to be at least as high as the current total number of votes in the cluster, the only way it will ever go down is via a recompute quorum event, or when a cluster is formed. What the expected votes is protecting against is the "cluster formed" situation, you don't want a network error that keeps the nodes from seeing each other to allow multiple subsets to successfully form a cluster independently; if they do, and they have direct access to shared storage (shared SCSI, FC), that shared storage will get corrupted.
You are running 7.3-2. I just tried the set cluster/expected on a test cluster running 8.3 (Alpha) and on it where the expected votes was 3, the current votes were 2 (one for the single node, and one for the quorum disk). On 8.3 the show cluster display will reflect the changed values in the CLUSTER section, but not the individual node sections. Perhaps there was a bug in 7.3-2 that has been fixed in 8.3? See attachment for the results of my testing. Note that the Cluster's transition time is updated when the quorum changes, but the nodes is not.
Before you reboot the node this weekend, verify that the EXPECTED_VOTES is set to 4 in the CURRENT parameter on the node you will reboot. I have no reason to believe it has been changed, but you don't want a surprise.
Good luck,
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-18-2010 12:27 AM
тАО06-18-2010 12:27 AM
Re: Quorum mismatch between cluster members
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-18-2010 12:31 AM
тАО06-18-2010 12:31 AM
Re: Quorum mismatch between cluster members
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-18-2010 08:46 AM
тАО06-18-2010 08:46 AM
Re: Quorum mismatch between cluster members
I did in fact use autogen.com when I last shut down the GHA1 node. It seems that remove_node is the default value autogen uses in a cluster.