Re: Strange values in the SHOW CLUSTER display

Edwin Gersbach_2 · ‎02-26-2007

Hi,

We run a cluster with two boot servers, a quorum disk and 50 workstations as satellites.
Each boot server has 2 votes, the quorum disk has one and the satellites have none.

A few days ago upgraded the cluster from 7.3-2 to 8.3 as follows:
1) Create a new SAN disk with a image backup the (single) system disk
2) boot one server from the 8.3 CD, perform the upgrade on and boot from the new disk
3) disable MOP on the 7.3-2 server
4) reboot most satellites
5) reboot second server
6) reboot remaining satellites

Everything runs fine - no problems have been encountered so far.

Now a colleague got panicky because when we do a SHOW CLUSTER, some nodes show a value of 3 and others a value of 2 for the Q column in the MEMBERS section. The latter would indicate the possibility of a cluster split! However, doing a SHOW CLUSTER on all nodes shows a value of 3 for CL_Q on each of them while CL_EXP and CL_V are both 5.

Some analysis revealed that all satellites showing a Quorum of 2 had been rebooted before the second boot server while all satellites showing a Quorum of 3 got rebooted after the second boot server.

The documentation (7.3) says
-------------------
Derived from EXPECTED_VOTES and calculated by the connection manager. It represents an initial value for the minimum number of votes that must be present for this node to function. The dynamic QUORUM value is the CL_QUORUM field, which is described in the CLUSTER class category
-------------------

The second sentence would indicate a value which was once valid but is not neccessarily valid now -> ergo useless. The third sentence would make sense, but is not what we actually get.

What is wrong here?

Edwin

Hoff · ‎02-27-2007

Your colleague is correct to be concerned here.

The EXPECTED_VOTES system parameter value appears wrong in the parameters on several nodes. In your current case, it should be set to five on all nodes.

I'd likely configure this cluster with one vote for each of the two servers, and with one vote for the quorum disk; assuming the quorum disk is connected on a multi-host bus.

The CL values for VOTES and for quorum are the calculated values; the running values. In your configuration, CL_QUORUM should be 3 and CL_VOTES should be 5 everywhere.

The risk here is not the running configuration, but in the interval before the connectivity is established. Once connections are established, values for parameters such as EXPECTED_VOTES will be floated upwards to the correct value. If connections cannot be established -- such as cases with name collisions; two nodes erroneously booting from the same system root is a classic example -- you can generate a partitioned cluster, and can stomp on your environment.

MODPARAMS has an include-file syntax, and it can be convenient to stuff things like the quorum values into a common file. Here's a somewhat complex example of customizing AUTOGEN, including local dynamic parameter calculation and use of AGEN$INCLUDE_PARAMS:

http://h71000.www7.hp.com/wizard/wiz_3604.html

There are details on the display settings here: http://h71000.www7.hp.com/doc/83final/6048/6048pro_065.html#shcl_part

There are details on properly setting VOTES and EXPECTED_VOTES for various specific local configuration requirements here:
http://www.hoffmanlabs.com/vmsfaq/

The present cluster configuration appears valid and stable, though I would address the settings as part of the next reboot cycle, and I would also specifically address the VOTES and EXPECTED_VOTES settings on each node node before bringing that new node into the cluster.

Stephen Hoffman
HoffmanLabs

Edwin Gersbach_2 · ‎02-27-2007

Hi Stephen,

As I said in the first place:
>> However, doing a SHOW CLUSTER on all nodes shows
>> a value of 3 for CL_Q on each of them while
>> CL_EXP and CL_V are both 5.

I may not have made clear enough this this is true for all systems. Also, our modparams.dat has just two lines beside 3 includes: SCSSYSTEMID and SCSNODE.

The problem is in the second column of the member class as can be seen in the attached file. ASSM80, which is my workstation, show a 2 in the 'Q' column but a 3 for CL_Q at the bottom.

Edwin

Hoff · ‎02-27-2007

You have a whole bunch of nodes with bad settings for EXPECTED_VOTES. It looks like about half the nodes present have an incorrect value, or no value.

Q=2 and Q=3: the setting in system parameters, as derived from EXPECTED_VOTES or whatever is available, as determined at boot time. Changes only when rebooted.

CL_EXP=5 : the running value, as corrected.

CL_Q=3 : the running value for the cluster quorum for all nodes, as derived from the total number of votes present plus one vote, rounded up.

The running setting can be adjusted upward automatically, and downward with the IPL C (IPC) or DECamds or SET CLUSTER /EXPECTED_VOTES command or such; with manual command input.

Please confirm that EXPECTED_VOTES is set to five on all nodes:

SYSMAN
SET ENV/CLUSTER
PARAM SHOW EXPECTED_VOTES

It appears that various nodes, including ASSM80, have a different value. And one that is lower than what HP recommends.

As for the system-level parameter information, there should be a value for VOTES and EXPECTED_VOTES in each MODPARAMS.DAT or stored in a (usually shared) AGEN$INCLUDE_PARAMS-based file.

The OpenVMS FAQ describes how to determine the correct and most appropriate settings for VOTES and QDISK_VOTES, and how to derive EXPECTED_VOTES from that value.

Wim Van den Wyngaert · ‎02-27-2007

I think I've seen it before.

The stations are setup in such a way that they boot as soon as they see a cluster. Normally this cluster would have 3 to 5 votes but the stations say expected votes=2 so that they can join a reduced cluster too (broken disk and 1 node brought down with remove_node). And because the stations have no votes they can not form a cluster of their own.

Wim

Wim

Hoff · ‎02-27-2007

>>>I think I've seen it before.

The stations are setup in such a way that they boot as soon as they see a cluster. Normally this cluster would have 3 to 5 votes but the stations say expected votes=2 so that they can join a reduced cluster too (broken disk and 1 node brought down with remove_node). And because the stations have no votes they can not form a cluster of their own.<<<

I've certainly seen it before, too.

The hazard is that you can boot into a partitioned cluster There are certainly cases when you want to bootstrap into a degraded configuration but -- in this case, with (say) two one-vote boot nodes and a one-vote quorum disk on a (assumed) shared cluster interconnect -- there seems no degraded configuration you could conceivably even want to boot into.

If either of the boot/voting nodes and the quorum disk is up, or both the boot/voting nodes and no quorum disk, you can boot a satellite.

If you're down to one node and no quorum disk, you are also down into the range when the cluster could conceivably be partitioned; where the two voting nodes have multiple problems. Is automatic booting here a good idea?

One configuration I saw up close and personal had incorrect settings, and was booted into a partitioned configuration when an SRM console command variable had gotten reset. Bye-bye disk data.

How does one get corrupted data? I'll assume for the sake of argument here that the quorum disk is on shared SCSI. If you look at the EXPECTED_VOTES values for the boot nodes shown in the attachment, you'll see that should you erroneously boot the two boot nodes from the same root, the cluster WILL start and the nodes will not connect. Unfortunately, the nodes will believe they have quorum because of the value in EXPECTED_VOTES, and each will allow processing. This is a partitioned cluster. Disks will get stomped on.

I have not tried to generate a partitioned cluster on a Fibre Channel (FC) SAN, but -- unless one of the FC controllers detects and prevents this -- the "duplicate" configuration is basically the same as a shared SCSI bus. It's a storage interconnect, meaning that unless the SAN notices two different nodes pounding on the same disk, it'll have the same effect. The two nodes cannot "see" each other over SCSI or over the FC SAN, but can reach storage.

The fellow that was concerned is right to have been concerned, IMHO. A set of blade-guards has been disabled here. If the blade guards were intentionally disabled, there is should be understanding of the risks and of the intended operations and command sequences for use in the degraded configurations. (And in this case, that there isn't a whole lot of value to disabling the blade guards, as the connection manager will float the running quorum value just as soon as connections are established.) Personally, I generally prefer to allow the automatic blade guards to remain in place and to work, and to only disable them upon explicit manual command input.

I prefer to have the cluster configuration to encounter a user data integrity interlock -- what can be called the quorum hang -- than to have the cluster proceed and risk stomping on data. The quorum scheme was not implemented to cause folks to have a hang and to have an outage, the quorum scheme is a set of blade guards specifically designed and implemented to prevent a serious outage.

Again, there's a whole section on this topic in the FAQ. (For the next edition of the FAQ, I'll add some text on deliberately-degraded bootstraps when initially forming a cluster or when booting in a degraded state, as I see that's not listed in the current edition.)

Edwin Gersbach_2 · ‎02-27-2007

>> Please confirm that EXPECTED_VOTES is set to five on all nodes:

>> SYSMAN
>> SET ENV/CLUSTER
>> PARAM SHOW EXPECTED_VOTES

Yes it is 5 on all servers and satellites!
Also, this value has never changed - at least not intentionally and I don't know how it could change otherwise. As mentioned, this value is specified in a included file.

As to the configuration assumptions:

The quorum disk (and all other disks) are SAN disks, two FC interfaces, 4 FC pathes. The interconnect between the two servers is by means of 2 LAN links via 2 different switches and a third dedicated SCS link by means of a direct wire between the two boxes.

>> A set of blade-guards has been disabled here.

Not that I'm aware of! The only unusual thing is that for some time (until all system got rebooted) the cluster has run from two different system disks, one with 7.3-2 and one with 8.3, with individual LDB, UAF, etc. And whenever a satellite was rebooted, both servers and the quorum disk have been available.

After all, this was just a rolling upgrade.

As there seems to be no risk for now, I will leave the cluster as is until I'm going to perform a planned autogen/reboot with feedback on all nodes in a few days. Let's see how it looks afterwards.

Edwin

Cass Witkowski · ‎03-14-2007

For my edification:

If there are two boot servers and a quorum disk and we assign one vote to each of the boot servers and quorum disk then according to my count the expected vote should be 3 not 5. What am I missing?

What is a blade guard?

Thanks

Cass

Steven Schweda · ‎03-14-2007

> What am I missing?

> Feb 27, 2007 12:50:13 GMT
> Each boot server has 2 votes, the quorum
> disk has one and the satellites have none.

2 + 2 + 1 = 5

Don't ask me why "[e]ach boot server has 2
votes".

> What is a blade guard?

The inconvenient and annoying part of a power
tool which is intended to keep your
vulnerable body parts away from the dangerous
moving parts. Here, used metaphorically.

Cass Witkowski · ‎03-15-2007

Ok I was looking at Hoff's first reply where he said, "I'd likely configure this cluster with one vote for each of the two servers, and with one vote for the quorum disk; assuming the quorum disk is connected on a multi-host bus."

That only gave me three votes, not five.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Strange values in the SHOW CLUSTER display

Strange values in the SHOW CLUSTER display