Re: Quorum mismatch between cluster members

GSmith · ‎06-16-2010

Hi,
I have a situation I can't explain and welcome your expert advice. I have a 4 node cluster running 7.3-2, all with individual system disks. There is no quorum disk. One of the members is reporting quorum as 3 and expected as 4. The other three are showing quorum as 2 and expected as 3.

View of Cluster from system ID 41162 node: GHA1 16-JUN-2010 10:40:26
+-----------------------------------------------------------+
| SYSTEMS | MEMBERS |
|-----------------------+-----------------------------------|
| NODE | SOFTWARE | VOTES | EXPECT | QUORUM | STATUS |
|--------+--------------+-------+--------+--------+---------|
| GHA1 | VMS V7.3-2 | 1 | 4 | 3 | MEMBER |
| GHA2 | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
| GHA3 | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
| GHA4 | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
+-----------------------------------------------------------+

View of Cluster from system ID 41164 node: GHA2 16-JUN-2010 10:41:04
+-----------------------------------------------------------+
| SYSTEMS | MEMBERS |
|-----------------------+-----------------------------------|
| NODE | SOFTWARE | VOTES | EXPECT | QUORUM | STATUS |
|--------+--------------+-------+--------+--------+---------|
| GHA2 | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
| GHA3 | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
| GHA1 | VMS V7.3-2 | 1 | 4 | 3 | MEMBER |
| GHA4 | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |
+-----------------------------------------------------------+

Display of parameters from all of the nodes is identical and looks correct.
GHA1 $ mc sysgen show expect
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
EXPECTED_VOTES 4 1 1 127 Votes
GHA1 $ mc sysgen show votes
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VOTES 1 1 0 127 Votes

I used the set cluster command to put them all back in sync but even though it looks like the command was accepted the show cluster display doesn't change.

GHA1 $ set cluster /expected
%SET-I-EXPTD_VOTES, new value of expected votes is 4, yields quorum of 3

GHA2 $ set cluster /expected=4
%SET-I-EXPTD_VOTES, new value of expected votes is 4, yields quorum of 3

I am taking one of the nodes down this weekend and don't want any surprises.

Steven Schweda · ‎06-16-2010

> [...] to put them all back in sync [...]

In sync with what? If the members have votes
of 4, 3, 3, 3, then how can expected votes be
only 4 (where one single guy (4) or any two
others (3+3) would have the expected votes?

The whole idea of these votes and a quorum is
to prevent multiple subsets happily forming
multiple clusters. I don't see how a value
of 4 expected votes can do that here.

> [...] don't want any surprises.

What would be unsurprising? How do you want
this stuff to work when not everyone is up?

Richard Brodie_1 · ‎06-16-2010

The important values are those in the cluster overview class: CL_EXP, CL_QUORUM, CL_VOTES. Those are the ones that relate to what set cluster/expected does, and the actual quorum.

The ones shown against the individual nodes should just reflect their own SYSGEN parameters.

Jan van den Ende · ‎06-16-2010

GSmith,

>>>
GHA1 $ mc sysgen show expect
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
EXPECTED_VOTES 4 1 1 127 Votes
<<<

Well, this at least agrees with
>>>
| GHA1 | VMS V7.3-2 | 1 | 4 | 3 | MEMBER |
<<<

And if you do
$ mc sysgen show expect
at the other 3 nodes, I expect to see
| GHAn | VMS V7.3-2 | 1 | 3 | 2 | MEMBER |

i.e.,
the other 3 nodes have EXPECTED_VOTES = 3
... and that implies the potentia for a partitioned cluster!!!!

For your own good: please check, and correct ASAP!

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Ian Miller. · ‎06-16-2010

Steven - each node has one vote, but the SHOW CLUSTER display shows expected votes of 3 on GHA2,3,4 and expected votes of 4 on GHA1.

____________________
Purely Personal Opinion

Steven Schweda · ‎06-16-2010

> Steven - each node has one vote, but [...]

Ah. I was misled by comparing with my own
(strange) cluster, which has one voting node,
and a bunch with no votes. Thanks for the
info.

Never mind.

Hoff · ‎06-16-2010

This looks to be the usual bogus cluster configuration, and confusion over the bootstrap settings (which provide the initial values, prior to successfully connecting to other boxes), the adjusted/current/live/running values (which Mr Brodie points to) and the meaning of the SET CLUSTER /EXPECTED command (which is typically used when downsizing a cluster under manual control).

Here? Just fix the votes and expected_votes settings in MODPARAMS.DAT and (if you're not maintaining AUTOGEN, which is itself a problem), and move on. Do grok the quorum scheme, but don't try to fool the quorum scheme. (Upon managing to successfully fool the quorum scheme, Bad Things tend to quickly happen to the disk data.)

If you have a shared interconnect here, you'll likely want to get a quorum disk configured here. This particularly if you have hardware RAID on that shared interconnect. Presuming shared hardware RAID and one-host survival, that'd typically have one vote for each host, three votes for the quorum disk, and expected_votes of (duh) seven. Presuming shared hardware RAID and three-host survival, one vote per host and one vote for the quorum disk, and expected_votes of (duh) five.

The VMS system management User Interface (UI) here has some serious shortcomings; it leaves folks rather confused about how all this works and (though this area has most definitely seen improvements in this regard) the diagnostics and automatic guidance have been lacking.

http://labs.hoffmanlabs.com/node/153
http://labs.hoffmanlabs.com/node/569
http://labs.hoffmanlabs.com/node/105

John Gillings · ‎06-16-2010

This is exactly the reason why those SYSGEN parameters which should be common across the cluster (like EXPECTED_VOTES, anything to do with quorum disks, cluster number, password etc...) should be physically common using the AGEN$INCLUDE_PARAMS mechansim.

Isolate common parameters into a CLUSTER_MODPARAMS.DAT file on your cluster common disk and include it into each node specific MODPARAMS.DAT. This should eliminate inconsistencies like the one reported here.

(of course cluster software SHOULD detect and warn about such inconsistencies, but HP has resisted all efforts to fix the issues surrounding cluster misconfigurations, instead expecting each system manager to personally experience all the common pitfalls and reinvent the wheel fixing them)

A crucible of informative mistakes

John McL · ‎06-16-2010

I agree with Hoff and John G.

I suspect that your problem came about when adding node GHA1 to the cluster. That machine has appropriate settings for a 4 node cluster and the other 3 have appropriate settings for a 3 node cluster.

P Muralidhar Kini · ‎06-16-2010

Hi GSmith,

>> I am taking one of the nodes down this weekend and don't want any
>> surprises.
You should consider yourself extremely lucky if you are not in for surprises
much before that!.

Each node has a Votes of 1. This is ok but then the expected votes setting is
incorrect. Threea of the nodes have expected votes of 3 while one of them have
a expected votes of 4. This can lead to a partitioned cluster.

As already indicated by Richard, you need to inclue the values of the CL_EXP,
CL_QUORUM and CL_VOTES in the "$SHOW CLUSTER" output to get a clear
picture of the cluster setup.

You need to relook at the Votes/ExpectedVotes settings of various nodes in the
cluster. The Links provided by Hoff should help you in this regard.

Regards,
Murali

Let There Be Rock - AC/DC

Robert Gezelter · ‎06-17-2010

GSmith,

I also concur with Hoff and JohnG, particularly John's comments about AGEN$INCLUDE.

The safest way to ensure consistency is to have include a single file by reference. Then, all changes are a question of changing the common file and running the build (e.g., AUTOGEN).

This is why programmers use include files and system management is no different. Having two independent copies of something is a recipe for "evolution" (unintended differential changes) to occur.

- Bob Gezelter, http://www.rlgsc.com

GSmith · ‎06-17-2010

I appreciate all of the advice but you seem to have missed the reason I posted this question.

>>>
Display of parameters from all of the nodes is identical and looks correct.
<<<

SYSMAN> do mc sysgen show votes
%SYSMAN-I-OUTPUT, command execution on node GHA3
VOTES 1 1 0 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA2
VOTES 1 1 0 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA4
VOTES 1 1 0 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA1
VOTES 1 1 0 127 Votes
SYSMAN> do mc sysgen show expected
%SYSMAN-I-OUTPUT, command execution on node GHA3
EXPECTED_VOTES 4 1 1 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA2
EXPECTED_VOTES 4 1 1 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA4
EXPECTED_VOTES 4 1 1 127 Votes
%SYSMAN-I-OUTPUT, command execution on node GHA1
EXPECTED_VOTES 4 1 1 127 Votes

Richard, Thanks for pointing out the CL_fields on the show cluster display.

CL_EXP = 1
CL_QUORUM = 3
CL_VOTES = 4

GSmith · ‎06-17-2010

CL_EXP = 4, not 1

Richard Brodie_1 · ‎06-17-2010

According to 'help field expected_votes':

"It is possible for this field to display a number smaller than the EXPECTED_VOTES parameter setting if the REMOVE_NODE option was used to shut down a cluster member or the SET CLUSTER/EXPECTED_VOTES DCL command was used since this node was last rebooted."

Well, I never knew that.

GSmith · ‎06-17-2010

-GHA4 experienced a bugcheck in April.
-GHA1 had a controlled re-boot in May.
-SHOW CLUSTER may show confusing values under certain circumstances as noted by Richard. I'll be sure to use the CLU_ fields in the future.

Hoff · ‎06-17-2010

> I'll be sure to use the CLU_ fields in the future.

You've found a cluster configuration error.

That's bad.

Go fix the values for your next reboot.

Your configuration can start processing with two votes and thus two nodes, and it should only start processing with three present.

Two two-node subclusters with shared storage would be Very Bad.

That sole four-vote node is likely going to prevent partitioning, but if somebody manages to get all four nodes set to three votes expected or if somebody has been using SET CLUSTER /EXPECTED to lower the required values, then all bets are off.

GSmith · ‎06-17-2010

Hoff,

Which 4 vote node? I have not the slightest idea of what you are trying to tell me to fix.

Hoff · ‎06-17-2010

I was looking at:

GHA1 | VMS V7.3-2 | 1 | 4 | 3 | MEMBER

Which lists four votes as expected.

If you should happen to get two nodes booted of your four votes and should those two not get connected to the other hosts (and thus get the calculated quorum values corrected), and should each of your nodes have expected_votes set to 3, then you can have two partitions within your cluster, and your shared data is corrupt.

Right now, that GHA1 box (which looks to have a correct setting for expected_votes) is the reason your cluster can't get into this partitioning case. That one setting prevents partitioning.

Quorum is the mechanism by which massive corruptions to your disks and other shared data are avoided. Failure to have a correct configuration can lead to data corruptions; your disks can end up completely corrupted long before you get to the login prompt.

I've seen a few of these partitioned configurations arise over the years, such as when somebody rebooted off the wrong system root, such as after a console battery failed. It's a mess.

Again, you do need to grok this stuff. Not look at the CL running settings. If the system parameters aren't set right, the corrections might not happen for cases where connectivity isn't fully available (for any of various reasons), and your data goes bye-bye.

Do you need to correct this Right Now? No. You have connectivity, and so long as the SET CLUSTER /EXPECTED isn't (mis)used, your quorum has been corrected to 3. Just get this fixed for your next reboot.

And consider getting that quorum disk going here, too, as that means you can survive the loss of more than one box in the cluster without a cluster quorum hang.

Please read what I linked to earlier.

Jon Pinkley · ‎06-18-2010

GSmith,

The output of "SYSMAN> do mc sysgen show" that you provided in your post from Jun 17, 2010 12:51:30 GMT looks convincing to me.

I disagree with Hoff, I don't see any evidence that the sysgen cluster parameters are misconfigured. The show cluster display does seem to be misleading, especially since you used set cluster/expected=4. This reporting may have been fixed since 7.3-2 (I can't reproduce the node specific displayed values changing in 8.3 with set cluster/expected (either changing down or back up) or set cluster/quorum. See more about this below).

Since neither votes nor expected_votes are dynamic parameters, we can assume that all nodes were booted with the correct value of expected votes. This is because, by default, sysgen shows the active parameters. All four of your nodes reported expected_votes 4. As long as you didn't change the CURRENT parameters, you should be fine when you reboot.

Most likely cause of this condition (in my opinion) is that when GHA1 had a controlled re-boot in May, the REMOVE_NODE shutdown option was used, and the remaining nodes adjusted quorum down. When GHA1 rebooted, it used EXPECTED_VOTES = 4 from the current parameter file and its show cluster display is consistent with the values in effect at the time it joined the cluster. The CL_EXP value will always ratchet up to be at least as high as the current total number of votes in the cluster, the only way it will ever go down is via a recompute quorum event, or when a cluster is formed. What the expected votes is protecting against is the "cluster formed" situation, you don't want a network error that keeps the nodes from seeing each other to allow multiple subsets to successfully form a cluster independently; if they do, and they have direct access to shared storage (shared SCSI, FC), that shared storage will get corrupted.

You are running 7.3-2. I just tried the set cluster/expected on a test cluster running 8.3 (Alpha) and on it where the expected votes was 3, the current votes were 2 (one for the single node, and one for the quorum disk). On 8.3 the show cluster display will reflect the changed values in the CLUSTER section, but not the individual node sections. Perhaps there was a bug in 7.3-2 that has been fixed in 8.3? See attachment for the results of my testing. Note that the Cluster's transition time is updated when the quorum changes, but the nodes is not.

Before you reboot the node this weekend, verify that the EXPECTED_VOTES is set to 4 in the CURRENT parameter on the node you will reboot. I have no reason to believe it has been changed, but you don't want a surprise.

Good luck,

Jon

it depends

Jon Pinkley · ‎06-18-2010

Here is the attachment with the testing I did trying to reproduce the value changing in the node specific portion of the display.

it depends

Jon Pinkley · ‎06-18-2010

Updated attachment with the SHOW_CLUSTER$INIT file (so you can reproduce the show cluster output).

Jon

it depends

GSmith · ‎06-18-2010

Thanks Jon.
I did in fact use autogen.com when I last shut down the GHA1 node. It seems that remove_node is the default value autogen uses in a cluster.

Categories

Company

Local Language

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Quorum mismatch between cluster members

Quorum mismatch between cluster members