Operating System - OpenVMS
1820646 Members
2050 Online
109626 Solutions
New Discussion юеВ

Re: Changing the number of votes for the quorum disk

 
Carleen Nutter
Advisor

Changing the number of votes for the quorum disk

What is the proper way to change the number of votes the quorum disk contributes. Is it enough to change the QDSKVOTES on each node (edit modparams and write current)and reboot the cluster or does something else need to happen?
21 REPLIES 21
John Gillings
Honored Contributor

Re: Changing the number of votes for the quorum disk

Carleen,

That's about the size of it. But I wouldn't use "write current", rather, edit MODPARAMS and use AUTOGEN to adjust the values.

Note that you should probably do a full cluster reboot, rather than a rolling reboot.

Something that isn't necessary, but I'd recommend is that all the SYSGEN paramters which need to be identical across the cluster should be stored in an AGEN$INCLUDE_PARAMS file on your cluster common area. That means you only need to change it once, and you can be somewhat more confident that all your nodes agree. The only time values will be out of synch is if you've made a change and not run AUTOGEN on one of the nodes. This can be checked automatically by comparing the modification date on the common parameter file with that of SYS$SYSTEM:SETPARAMS.DAT.

So, for example, your MODPARAMS.DAT might have the lines:

! Cluster common parameters
AGEN$INCLUDE_PARAMS CLUSTER$COMMON:CLUSTERPARAMS.DAT
!

Obviously you need to have node specific stuff like SCSNODE in there as well, but
anything that is the same on all nodes goes in CLUSTER$COMMMON:CLUSTERPARAMS.DAT. For example (4 node cluster with quorum disk):

VAXCLUSTER=2
VOTES=1
DISK_QUORUM="$46$DIA31 "
QDSKVOTES=4
EXPECTED_VOTES=8
QDSKINTERVAL=10

NISCS_LOAD_PEA0=1
NISCS_PORT_SERV=0
MIN_NISCS_MAX_PKTSZ=4468
MSCP_LOAD=1
MSCP_SERVE_ALL=1
TMSCP_LOAD=1
TMSCP_SERVE_ALL=1
!
ALLOCLASS=46
!
MIN_SCSCONNCNT=40
!
PAGEFILE=0
SWAPFILE=0
DUMPFILE=0
dumpstyle=1
!
! security compliance
!
LGI_BRK_TMO=720
LGI_HID_TIM=86400
MAXSYSGROUP=7
MIN_MAXBUF=4096
!
TTY_DEFCHAR =%x180010B8 ! 24 lines+SCOPE+(NOWRAP)+LOWER+TTSYNC+HOSTSYNC+ESCAPE
TTY_DEFCHAR2=%x00023002 ! DISCONNECT+EDITING+INSERT+AUTOBAUD
A crucible of informative mistakes
Carleen Nutter
Advisor

Re: Changing the number of votes for the quorum disk

I have a 4 node cluster. Each node gets 1 votes. I have QDSKVOTES=3 and Expected_votes = 7. Sometimes I need only
1 nodes up - so with qdskvotes=3, I can.
Problem is, a show cluster command indicates that the quorum disk is only contributing 1 vote. I did pass this scheme by tech support a few months ago.

With all 4 nodes booted, show cluster
says


CL_EXP = 7
CL_QUORUM=4
CL_VOTES=5
CL_QDV=1




Martin P.J. Zinser
Honored Contributor

Re: Changing the number of votes for the quorum disk

Hello Carleen,

did you try to have just one node up already? If yes it sounds like more of a display problem, if not there might be a real issue. Can you give us a few more details on your setup like the VMS version you are using?

Greetings, Martin
Lokesh_2
Esteemed Contributor

Re: Changing the number of votes for the quorum disk

Hi,

AFAIK, The CL_QDVOTES in the cluster are calculated as the minimum of qdskvotes on any node of the cluster. You need to check QDSKVOTES on all the nodes of your cluster.

Are you using satellite nodes in your cluster ?

Thanks & regards,
Lokesh Jain
What would you do with your life if you knew you could not fail?
Mobeen_1
Esteemed Contributor

Re: Changing the number of votes for the quorum disk

Carleen,
We could change the expected votes system param by 2 methods

Method1#
1. Use Current
2. Set
3. Write Current
4. Modify Modparams.dat

Method2#
1. Modify Modparams.dat
2. Use AUTOGEN SETPARAMS

In Method#1, the values will be changed in your current database and once you reboot your node, the values will be in the sysgen dB permanently

In Method#2, the values will take effect upon a reboot of the node.

The practise i have been following is Method#1.

regards
Mobeen
Carleen Nutter
Advisor

Re: Changing the number of votes for the quorum disk

Some clarifications:
The VMS version is 7.2-2 with patches.

Each of the 4 nodes (via show current and show active):
Votes=1
Expected_votes=7
Qdskvotes=3

There are no satellite nodes.

Prior to a about 10 days ago, this was a
3 node cluster with 1 satellite and the quorum disk had only 1 vote.
Since this is a production cluster, I dont have the luxury of shutting nodes down to see if it's a display issue or a real issue.
I did reboot 1 node yesterday and noted that
the CLU_QUORUM values stayed at 4 (as it should have) but the CL_VOTES went from 5 to 4 and CL_QDV stayed at 1. It could still be a display issue - but I am unsure and dont want to be surprised when/if I lose 2 nodes or shutdown 2 nodes and have the rest of the cluster hang.

My thought are that it's a display problem or that I missed a step when changing the
QDSKVOTES from 1 to 3 - should the quorum.dat file on the quorum disk get updated in some manner. The modify date on that file did not change when I added the 4th node and changed qdskvotes.
Eberhard Wacker
Valued Contributor

Re: Changing the number of votes for the quorum disk

Hello Carleen,

after you did finish to set up your final configuration: was the cluster down in TOTAL at least 1 time ? This is a really important item regarding this discusssion !
The cl_qdskvotes will remain at 1 till this has been done (at least I think so and I├в m quite sure about this but I├в m not able to test it, I do not have a cluster for my own where I can do what I want).

A few hints regarding your configuration:

Quorum disk votes 3 are only necessary to let 1 node running when all others are down (e.g. shutdown without /remove_node option and/or crashed). Avoid of partitioning is realized via the expected votes setting.
If you use quorum disk vote = 1 and expected votes = 5 the quorum is 3 i.e. 2 nodes can crash and your cluster is still running. You can then adjust your remaining running configuration with the dcl command SET CLUSTER/EXPECTED. After this even the 3rd node can crash and the last node will continue to run.
With this configuration you have only a minor problem to boot the very first node alone after the whole cluster was shut down. This problem can be resolved by making a conversational boot and set expected votes to 1, 2 or 3. The node will boot and form the cluster with temporary cluster quorum 2. The next node can boot ├в normal├в , due to its setting of expected votes 5 the cluster quorum will now be set to 3 (and this is fulfilled by the now running participating quorum contributors).
If you use the ├в officially├в recommended value of 3 for the quorum disk votes then you can get into trouble when the quorum disk gets defect. If then in addition one of the nodes crashes the cluster will hang (maybe this can be avoided with a set cluster/expected=4 if you have time to do this but I don├в t know if this will really work in such a case).

There are ways to keep a cluster go on running after the case of a hung: via CTRL-P and setting quickly a few instructions on the console level or using the features of AMDS /Availability Manager.

At last:
It can be that there is a VMS software bug. We do not use V7.2-2 so I cannot prove any statement regarding this software version. But with V7.2-1H1 we did have a cluster quorum adjustment problem when shutting down a node with option /remove_node !!! We never got an official solution for this, all released patches did not solve the problem. It seemed that we were the only customer in the whole wide world who did have this problem. Unbelievable for me but it seemed so.
Our workaround was the manual execution of $set cluster/exp (as described above) when having shutdowned two nodes of this 10 node cluster.
Now the positive aspect (for us): this problem did NOT reoccur after having upgraded to VMS V7.3-1 !!!
Mobeen_1
Esteemed Contributor

Re: Changing the number of votes for the quorum disk

Carleen,
Please check this out, it should be able to give you enough information

1. VOTES

2. EXPECTED VOTES

The following definitions/formulas need to be kept in mind

1. When nodes in the OpenVMS Cluster boot, the connection manager uses the largest value for EXPECTED_VOTES of all systems present to derive an estimated quorum value according to the following formula:
Estimated quorum = (EXPECTED_VOTES + 2)/2 | Rounded down

2. During a state transition, the connection manager dynamically computes the cluster quorum value to be the maximum of the following:
The current cluster quorum value
The largest of the values calculated from the following formula, where the EXPECTED_VOTES value is largest value specified by any node in the cluster:
QUORUM = (EXPECTED_VOTES + 2)/2 | Rounded down
The value calculated from the following formula, where the VOTES system parameter is the total votes held by all cluster members:
QUORUM = (VOTES + 2)/2 | Rounded down

The following link will give you adequate reading on VMS cluster configs

http://broadcast.ipv7.net:81/openvms-manual/72final/4477/4477pro_002.html

Let me know if you need any specific information

regards
Mobeen
Henk Ouwersloot
Advisor

Re: Changing the number of votes for the quorum disk

Hello Carleen,

I did a short test on my cluster. The setup your are using is ok (if set on every node in your cluster):

VOTES = 1
EXPECTED_VOTES = 7
QDSKVOTES = 3

Please check the following:

1 - SHOW CLUSTER/CONT
2 - ADD FORMED

If the DATE/TIME in the field "FORMED" is before the time you changed the paramter QDSKVOTES, then you need to reboot your entire cluster.

This MUST be a cluster reboot and not node by node! This should solve your problem.

Kind Regards,
Henk
Robert Atkinson
Respected Contributor

Re: Changing the number of votes for the quorum disk

I would reiterate Eberhard's comment. If you do not need all of your nodes to be running, but could suffice just with any one of them, then don't bother with Cluster Votes - just set the expected to 1.

Rob.
Carleen Nutter
Advisor

Re: Changing the number of votes for the quorum disk

To answer a few questions raised by those who've replied.

I did reboot the cluster after changing the vote related parameters; but it was before adding the new node. I edited the new nodes modparam.dat file with the vote-related parameters, so that when I booted it to join the cluster (and goes thru it's autogen and reboot), it would have the same values (votes=1,expected_votes=7, qdskvotes=3). Tech support said I would not need to reboot the entire cluster after adding the new node. The Date Formed is
consistent with the above.

I did read the cluster manual, thoroughtly.
This is a single application cluster and I considered several voting schemas. The one implemented seemed to accomodate the requirements the best.

I'll have to take some time to consider some
of the suggestions.

But perhaps, I just need to reboot the entire cluster again; and just be careful
about the shutdown. I do not have a planned
shutdown scheduled for several months.

I'll post the results in this thread at that
time.

Thanks to all for taking the time to reply and share your knowledge.
Rob Buxton
Honored Contributor

Re: Changing the number of votes for the quorum disk

Just as a different tack, when shutting down Servers you can use the options that force a recalculation of Quorum.
This would allow you to shutdown Servers so that you're still left with just one Server plus the Quorum disk.
Eberhard Wacker
Valued Contributor

Re: Changing the number of votes for the quorum disk

Hi again, there was so much written, so now the main item, from my point of view, in short: you DO NEED a total CLUSTER SHUTDOWN to have a situation from which you can go further on.
Regards,
Eberhard
Jan van den Ende
Honored Contributor

Re: Changing the number of votes for the quorum disk

Carleen,

maybe I am missing a critical point somewhere, but as far as I was able to conclude from the info so far,

You have a 4-node cluster,
running a single application,
you can NOT fully go down ( i.e., you need 24*365 uptime),
you want to be able to shut down SOME node(s),
you SHUT DOWN those nodes in a planned way,
system CRASHES are rare.

IF these assumptions are correct, then WHY would you complicate things by HAVING a quorum disk?
A quorum disk is only an ugly (but very functional!) trick to allow a two-node cluster to survive the CRASH of any one voter (a node or maybe the quorum disk). If you grow to 3 nodes, the 3rd node (as seen from the other 2) functions just as well (actually, better, because more responsive) as a quorum disk.

In a 4-node cluster you can (if you want to) shut down 3 nodes, and continue running the 4th. (Provided you DON'T forget to REMOVE_NODE at shutdown, and you don't start a next shutdown before the previous IS down).

As long as you have at least 3 nodes running, any node crashing will leave your cluster running.

On the other side, if you have your proposed config, (4 nodes @ 1 vote, quorum disk @ 3 votes), then, if you have the situation with 1 or 2 nodes down, THEN, if you loose (connection to) the quorum disk, you WILL loose quorum!

So:
4 nodes @ 1 vote, NO QDSK, you CAN have 1 node planned-down, AND continue if another node ( = voter ) CRASHES.
4 nodes @ 1 vote, QDSK @ 3 votes, 1 node planned-down, you WILL continue through a node ( = 1-voter ) crash, but you will HANG at a Qdsk crash, OR, if your somehow loose CONNECTION to it.

IF you get down to 2 active nodes, you stand the risk of quorum loss eigther way.

If our site could count as reference:
it started as 3-node (split over 2 sites) and since has grown to 4 nodes (2 sites, 2 nodes each). We NEVER had a quorum disk.
Our current cluster uptime approaches 7 years.

I think it would even be pretty simple to remove the quorum disk without cluster shutdown:
- dismount qdsk &
- immediately SET CLUSTER/EXPECTED
- node-by-node:
---adjust SYSGEN QDSK, QDSKVOTES, & EXPECTED_VOTES, in MODPARAMS and use AUTOGEN or via WRITE CURRENT &
--- reboot.

Think it over, and if you find any wrongs in my reasoning, please post it. It would mean that I got something to learn. :-)

Succes!!

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Uwe Zessin
Honored Contributor

Re: Changing the number of votes for the quorum disk

Carleen,

if a new node joins a VMScluster with VOTES, the cluster quorum will be automatically adjusted upwards if necessary. EXPECTED_VOTES, as far as I can tell, is only used when a node boots. My experience is that a node that has EXPECTED_VOTES set too high cannot join a given cluster. Applying the required quorum to the rest of the cluster would cause it to hang.

You can even boot one node and it will 'hang' with a 'quorum lost' message, but if you can get to the console prompt with control-P you can request a recalculation of cluster quorum. After that you have a single-node cluster running, although your EXPECTED_VOTES setting was higher. I have attached a little .TXT file that demonstrates the required commands. Just don't forget to enter them _fast_ when you have a hung cluster with multiple nodes - otherwise the sanity timers on the other nodes expire and the node that you are working with will crash with a CLUEXIT bugcheck.

Be careful if you try this with multiple nodes! Make sure that all nodes can communicate with each other - if not you could end up with several 1-node clusters that will happily access your disks without synchronization.

The shutdown option that Rob was talking about is REMOVE_NODE. It took DEC until VMS version 6.2 to get it working, but then I could shut down a complete cluster, node for node properly. Even the last node ran without a quorum disk.

I don't know your cluster's structure (what communication busses, shared storage, ...) looks like, but I agree with Jan that a Quorum Disk can cause a lot of pain and I try to avoid it whenever possible.

.
Carleen Nutter
Advisor

Re: Changing the number of votes for the quorum disk

Definately some things to think about. In a few months, I'll have some time to do some experimenting with the cluster.

If I decide to dispense with the quorum disk, and use REMOVE_NODE during controlled shutdowns, then if I want to boot up just
1 node, am I correct that I can't just boot 1 node without playing with the expected votes during a conversational boot?
Or I must boot at lease 2 others at the same time?

Also, can anyone confirm Jan's instructions about removing the quorum disk from the cluster?
Uwe Zessin
Honored Contributor

Re: Changing the number of votes for the quorum disk

Carleen,

you can boot just one node and wait until it has formed a cluster on its own. Of course it will hang with 'quorum lost'. Then you can use the command sequence I have presented above to tell that node to recalculate the quorum.

If the node has just one vote it will set the quorum to one, too. There will be a short cluster state transition and then the node will continue booting. Of course you can use the same 'trick' with 2 or more nodes. Just make sure that both nodes have joined the same cluster - you see that from the connection manager (%CNXMAN) messages. Else, as I have already written, you will get two or more independent single-node clusters which will happily eat your disks.

There is another 'trick' if you temporarily want to boot a single node:
- boot conversational
- SYSBOOT> SET EXPECTED_VOTES 1
- SYSBOOT> SET VOTES 1
- SYSBOOT> SET WRITESYSPARAMS 0

I hope I got that last name right - it is just a flag that tells the system to write back any changes made during the conversational boot back to the system parameter file. If you set it to 0, your changes don't stay over to the next boot and you don't need to bother to undo your changes.


I have never tried to remove a quorum disk from a life cluster - I don't even know if that is possible, as that information might be re-distributed from nodes that still know there was a quorum disk.
.
Carleen Nutter
Advisor

Re: Changing the number of votes for the quorum disk

A planned power outage to the data center to repair blown fuses in the ups gave me the opportunity to shutdown my cluster and reboot it. I shutdown the cluster using the /cluster_shutdown option on each node, so I wouldn't hang the cluster because of messed up votes & quorum. Later, I was able to boot just 1 node and this time CL_QDV (cluster quorum disk votes =3). When the other 3 nodes booted, the "sho cluster" output was what I expected it to be.
CL_EXP=7, CL_QUORUM=4, CL_VOTES=5, QF_VOTES=YES, CL_QDV=3.
So it looks like the problem was resolved by a full cluster reboot.

Thanks to all who contributed to this discussion. There is alot of valuable information in this thread.
Uwe Zessin
Honored Contributor

Re: Changing the number of votes for the quorum disk

Carleen,
I don't see what version of OpenVMS you are using, but with V6.2 the REMOVE_NODE option of SHUTDOWN.COM finally worked properly.

If a cluster is hanging due to a quorum loss it is usually possible to request a quorum recalculation. I don't see what hardware you are using and how it is set up, but it is usually possible to press Control-P at the serial console or toggle the HALT button to get to the console prompt. Then you put the value 12(10) in the software interrupt request register - on Alpha this is:
>>> deposit sirr c

and then:
>>> continue[return]
IPC> Q[return]
IPC> [Control-Z]

That should make the cluster working again. If I recall correctly there are also external management tools like AMDS or Available Manager that run on a different system. It is possible to request a quorum recalculation from there, too. I have never worked with them and I don't do system management so I cannot tell what their status is - can somebody else tune in?
.
Mike Naime
Honored Contributor

Re: Changing the number of votes for the quorum disk

If you are wanting to use AMDS to remotely make changes , you must first configure the AMDS client to allow you to write changes to the system. This was something that we discovered after we had our AMDS server installed for about a month. We needed to kill some processes on a system that had exceded BALSETCOUNT. No luck, we had to halt the system. After talking to our TAMS, it was discovered that the default configuration file on the client install would not allow for changes to be made. The default AMDS install only allows you to monitor the system, not to make changes. We had to modify the AMDS config file before we where able to make remote changes.
VMS SAN mechanic
Ian Miller.
Honored Contributor

Re: Changing the number of votes for the quorum disk

you have to allow the AMDS console Write access to be able to apply fixes. Definately some that should be done on all VMS systems I think. It can be without reboot. Edit AMDS$CONSOLE_ACCESS.DAT and define a triplet with W access. The system from which you fixe things should have a matching triplet. You don't really want to keep the example entry.
____________________
Purely Personal Opinion