- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Quorum mismatch between cluster members
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 04:26 AM
тАО06-17-2010 04:26 AM
Re: Quorum mismatch between cluster members
I also concur with Hoff and JohnG, particularly John's comments about AGEN$INCLUDE.
The safest way to ensure consistency is to have include a single file by reference. Then, all changes are a question of changing the common file and running the build (e.g., AUTOGEN).
This is why programmers use include files and system management is no different. Having two independent copies of something is a recipe for "evolution" (unintended differential changes) to occur.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 04:51 AM
тАО06-17-2010 04:51 AM
Re: Quorum mismatch between cluster members
>>>
Display of parameters from all of the nodes is identical and looks correct.
<<<
SYSMAN> do mc sysgen show votes
%SYSMAN-I-OUTPUT, command execution on node GHA3
VOTES 1 1 0 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA2
VOTES 1 1 0 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA4
VOTES 1 1 0 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA1
VOTES 1 1 0 127 Votes
SYSMAN> do mc sysgen show expected
%SYSMAN-I-OUTPUT, command execution on node GHA3
EXPECTED_VOTES 4 1 1 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA2
EXPECTED_VOTES 4 1 1 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA4
EXPECTED_VOTES 4 1 1 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA1
EXPECTED_VOTES 4 1 1 127 Votes
Richard, Thanks for pointing out the CL_fields on the show cluster display.
CL_EXP = 1
CL_QUORUM = 3
CL_VOTES = 4
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 04:55 AM
тАО06-17-2010 04:55 AM
Re: Quorum mismatch between cluster members
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 05:51 AM
тАО06-17-2010 05:51 AM
Re: Quorum mismatch between cluster members
"It is possible for this field to display a number smaller than the EXPECTED_VOTES parameter setting if the REMOVE_NODE option was used to shut down a cluster member or the SET CLUSTER/EXPECTED_VOTES DCL command was used since this node was last rebooted."
Well, I never knew that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 08:09 AM
тАО06-17-2010 08:09 AM
Re: Quorum mismatch between cluster members
-GHA1 had a controlled re-boot in May.
-SHOW CLUSTER may show confusing values under certain circumstances as noted by Richard. I'll be sure to use the CLU_ fields in the future.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 09:30 AM
тАО06-17-2010 09:30 AM
Re: Quorum mismatch between cluster members
You've found a cluster configuration error.
That's bad.
Go fix the values for your next reboot.
Your configuration can start processing with two votes and thus two nodes, and it should only start processing with three present.
Two two-node subclusters with shared storage would be Very Bad.
That sole four-vote node is likely going to prevent partitioning, but if somebody manages to get all four nodes set to three votes expected or if somebody has been using SET CLUSTER /EXPECTED to lower the required values, then all bets are off.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 10:18 AM
тАО06-17-2010 10:18 AM
Re: Quorum mismatch between cluster members
Which 4 vote node? I have not the slightest idea of what you are trying to tell me to fix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2010 11:04 AM
тАО06-17-2010 11:04 AM
Re: Quorum mismatch between cluster members
GHA1 | VMS V7.3-2 | 1 | 4 | 3 | MEMBER
Which lists four votes as expected.
If you should happen to get two nodes booted of your four votes and should those two not get connected to the other hosts (and thus get the calculated quorum values corrected), and should each of your nodes have expected_votes set to 3, then you can have two partitions within your cluster, and your shared data is corrupt.
Right now, that GHA1 box (which looks to have a correct setting for expected_votes) is the reason your cluster can't get into this partitioning case. That one setting prevents partitioning.
Quorum is the mechanism by which massive corruptions to your disks and other shared data are avoided. Failure to have a correct configuration can lead to data corruptions; your disks can end up completely corrupted long before you get to the login prompt.
I've seen a few of these partitioned configurations arise over the years, such as when somebody rebooted off the wrong system root, such as after a console battery failed. It's a mess.
Again, you do need to grok this stuff. Not look at the CL running settings. If the system parameters aren't set right, the corrections might not happen for cases where connectivity isn't fully available (for any of various reasons), and your data goes bye-bye.
Do you need to correct this Right Now? No. You have connectivity, and so long as the SET CLUSTER /EXPECTED isn't (mis)used, your quorum has been corrected to 3. Just get this fixed for your next reboot.
And consider getting that quorum disk going here, too, as that means you can survive the loss of more than one box in the cluster without a cluster quorum hang.
Please read what I linked to earlier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-18-2010 12:16 AM
тАО06-18-2010 12:16 AM
SolutionThe output of "SYSMAN> do mc sysgen show" that you provided in your post from Jun 17, 2010 12:51:30 GMT looks convincing to me.
I disagree with Hoff, I don't see any evidence that the sysgen cluster parameters are misconfigured. The show cluster display does seem to be misleading, especially since you used set cluster/expected=4. This reporting may have been fixed since 7.3-2 (I can't reproduce the node specific displayed values changing in 8.3 with set cluster/expected (either changing down or back up) or set cluster/quorum. See more about this below).
Since neither votes nor expected_votes are dynamic parameters, we can assume that all nodes were booted with the correct value of expected votes. This is because, by default, sysgen shows the active parameters. All four of your nodes reported expected_votes 4. As long as you didn't change the CURRENT parameters, you should be fine when you reboot.
Most likely cause of this condition (in my opinion) is that when GHA1 had a controlled re-boot in May, the REMOVE_NODE shutdown option was used, and the remaining nodes adjusted quorum down. When GHA1 rebooted, it used EXPECTED_VOTES = 4 from the current parameter file and its show cluster display is consistent with the values in effect at the time it joined the cluster. The CL_EXP value will always ratchet up to be at least as high as the current total number of votes in the cluster, the only way it will ever go down is via a recompute quorum event, or when a cluster is formed. What the expected votes is protecting against is the "cluster formed" situation, you don't want a network error that keeps the nodes from seeing each other to allow multiple subsets to successfully form a cluster independently; if they do, and they have direct access to shared storage (shared SCSI, FC), that shared storage will get corrupted.
You are running 7.3-2. I just tried the set cluster/expected on a test cluster running 8.3 (Alpha) and on it where the expected votes was 3, the current votes were 2 (one for the single node, and one for the quorum disk). On 8.3 the show cluster display will reflect the changed values in the CLUSTER section, but not the individual node sections. Perhaps there was a bug in 7.3-2 that has been fixed in 8.3? See attachment for the results of my testing. Note that the Cluster's transition time is updated when the quorum changes, but the nodes is not.
Before you reboot the node this weekend, verify that the EXPECTED_VOTES is set to 4 in the CURRENT parameter on the node you will reboot. I have no reason to believe it has been changed, but you don't want a surprise.
Good luck,
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-18-2010 12:27 AM
тАО06-18-2010 12:27 AM