Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

 
YJTAN
Occasional Advisor

EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Hi,

We did a test on "Shutdown and Reboot sequence" and got some questions to ask.

We did the test on a cluster of 7 nodes.

HSP101 (GS1280)
HSP102 (GS1280)
HSP103 (RX8620)
HSP104 (RX8620)
HSP105 (GS1280)
HSP106 (GS1280)
HSQ101 (RX2600)

Attached is the file containing the "SHOW CLUSTER/CONT" screen dumps for all the steps that we had done.

Scenario 1
Shutting down 6 nodes, leaving only the Quorum Server ( HSQ101 ).
We started to reboot the nodes one by one, but they were not able to join the cluster.
Until we rebooted the HSQ101, the rest of the nodes started to form a cluster.
( See Step 10 to Step 23 )
Why couldn't the nodes join the HSQ101 as a cluster ?

Scenario 2
However, from Step 2 to Step 5, we shutdown 4 nodes out of 7 nodes, We were able to boot up the nodes again ( joining the cluster ) from step 6 till step 9.
Why in Scenario 1, the rest of the nodes could not joined the HSQ101 as a cluster and In scenario 2, the 4 nodes could joined the cluster ?

Other questions

1) Why the EXPECTed_vote of the nodes, once it has dropped, it will never get back to the "full" value ?
( Like in Step 9, EXPECT was 5 for some nodes and not the value of 7 )

2) Why on Step 9, the "SET CLUSTER/EXPECTED=7 " that I did on HSP103 has no effect on the node ?
17 REPLIES 17
Karl Rohwedder
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Yjtan,

when you shutdown 6 nodes, this leaves just 1 vote in the cluster. When rebooting, the nodes will use the SYSGEN parameter EXPECTED_VOTES of 7 to calculate a quorum of 4 and wait with joining the cluster, until 4 votes are available.

regards Kalle
Highlighted
YJTAN
Occasional Advisor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

We rebooted all the 6 nodes, all we saw on the consoles were as below.
From HSQ101, the STATUS of SHOW/CLUSTER/CONT of the six nodes were "NEW", instead of "MEMBER".
Until We shutdown HSQ101. The rest started to form a cluster.

P/S: OpenVMS 8.3
=====
%CNXMAN, Sending VMScluster membership request to system HSQ101
%CNXMAN, Sending VMScluster membership request to system HSQ101
%CNXMAN, Have connection to system HSQ101
%CNXMAN, Have connection to system HSP101
%CNXMAN, Have connection to system HSP102
%CNXMAN, Have connection to system HSP103
%CNXMAN, Have connection to system HSP104
%CNXMAN, Have connection to system HSP105
%CNXMAN, Sending VMScluster membership request to system HSQ101
%CNXMAN, Sending VMScluster membership request to system HSQ101
Wim Van den Wyngaert
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

You have 2 nodes with expected votes 7, 2 with expected votes 5, 1 with expected votes 3 and 2 with expected votes 1.

As soon as those having 7 left the cluster, 5 will be used to calculate the quorum. Etc.

When you restart, as soon as a "7" tries to join the cluster, you will need 4 votes to get it alive.

It would help if we knew the VOTES of each member.

Wim
Wim
YJTAN
Occasional Advisor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Each contribute 1 vote, with EXPECTED_VOTE of 7.
Wim Van den Wyngaert
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Could you confirm that all nodes have SYSGEN parameter VOTES on 1 and EXPECTED_VOTES on 7 ? If they are on 1 and 7, please post sysgen show/all.

I'm not used to such clusters but I would say that you have a bug if 1/7 is true. After 4 members the cluster should be alive.

Find it strange that the expected votes in your output jumped from 7 to 5. I assumed that this was caused by the maximum expected votes present descreasing.

But I vaguely remember that removing a node does not alter the expected votes unless you use the shutdown option REMOVE_NODE. Did you (in all cases ?) ?

Wim
Wim
YJTAN
Occasional Advisor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Yes. All the shutdowns were done with REMOVE_NODE option.

There are 3 system disks in this cluster.
The GS1280s share 1. The RX8620s share 1 and The Quorum Server (RX2600) has it's own one.

I will attach the SYSGEN SHOW/ALL when I have the chance to access the systems.
Volker Halle
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

YJTAN,

I've not done all the math for your scenario, but consider this:

If the remaining node HSQ101 just has a dynamic value of EXPECTED_VOTES=1 (because you've shut down all other nodes with REMOVE_NODE) and you now boot a member with EXPCTED_VOTES=7, it cannot join the cluster. It will see HSQ101 and report it on the console after a while, but it will not be allowed to join, because the cluster would then immediately loose quorum. You should have been able to boot the other nodes conversatoinal and change EXPECTED_VOTES to just the 'right' value at that time...

Just a thought,

Volker.
Volker Halle
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

re: I will attach the SYSGEN SHOW/ALL when I have the chance to access the systems.

$ MC SYSGEN SHOW/CLUSTER should be sufficient for this problem.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

I think the remove_node option sets the quorum to the number of votes. And it recalculates expected_votes based upon this.

Until step 15 it's fine.

But then when you reboot, the cluster should reject the node until 4 nodes are present. Which it doesn't. Bug or caused by very bad params I would say.

In step 22 you reboot the Q node (don't know why you call it quorum node). The remaining nodes take over the quorum calculation and this time correct.

Wim

Wim