Operating System - OpenVMS
1827474 Members
1926 Online
109965 Solutions
New Discussion

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

 
YJTAN
Occasional Advisor

EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Hi,

We did a test on "Shutdown and Reboot sequence" and got some questions to ask.

We did the test on a cluster of 7 nodes.

HSP101 (GS1280)
HSP102 (GS1280)
HSP103 (RX8620)
HSP104 (RX8620)
HSP105 (GS1280)
HSP106 (GS1280)
HSQ101 (RX2600)

Attached is the file containing the "SHOW CLUSTER/CONT" screen dumps for all the steps that we had done.

Scenario 1
Shutting down 6 nodes, leaving only the Quorum Server ( HSQ101 ).
We started to reboot the nodes one by one, but they were not able to join the cluster.
Until we rebooted the HSQ101, the rest of the nodes started to form a cluster.
( See Step 10 to Step 23 )
Why couldn't the nodes join the HSQ101 as a cluster ?

Scenario 2
However, from Step 2 to Step 5, we shutdown 4 nodes out of 7 nodes, We were able to boot up the nodes again ( joining the cluster ) from step 6 till step 9.
Why in Scenario 1, the rest of the nodes could not joined the HSQ101 as a cluster and In scenario 2, the 4 nodes could joined the cluster ?

Other questions

1) Why the EXPECTed_vote of the nodes, once it has dropped, it will never get back to the "full" value ?
( Like in Step 9, EXPECT was 5 for some nodes and not the value of 7 )

2) Why on Step 9, the "SET CLUSTER/EXPECTED=7 " that I did on HSP103 has no effect on the node ?
17 REPLIES 17
Karl Rohwedder
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Yjtan,

when you shutdown 6 nodes, this leaves just 1 vote in the cluster. When rebooting, the nodes will use the SYSGEN parameter EXPECTED_VOTES of 7 to calculate a quorum of 4 and wait with joining the cluster, until 4 votes are available.

regards Kalle
YJTAN
Occasional Advisor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

We rebooted all the 6 nodes, all we saw on the consoles were as below.
From HSQ101, the STATUS of SHOW/CLUSTER/CONT of the six nodes were "NEW", instead of "MEMBER".
Until We shutdown HSQ101. The rest started to form a cluster.

P/S: OpenVMS 8.3
=====
%CNXMAN, Sending VMScluster membership request to system HSQ101
%CNXMAN, Sending VMScluster membership request to system HSQ101
%CNXMAN, Have connection to system HSQ101
%CNXMAN, Have connection to system HSP101
%CNXMAN, Have connection to system HSP102
%CNXMAN, Have connection to system HSP103
%CNXMAN, Have connection to system HSP104
%CNXMAN, Have connection to system HSP105
%CNXMAN, Sending VMScluster membership request to system HSQ101
%CNXMAN, Sending VMScluster membership request to system HSQ101
Wim Van den Wyngaert
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

You have 2 nodes with expected votes 7, 2 with expected votes 5, 1 with expected votes 3 and 2 with expected votes 1.

As soon as those having 7 left the cluster, 5 will be used to calculate the quorum. Etc.

When you restart, as soon as a "7" tries to join the cluster, you will need 4 votes to get it alive.

It would help if we knew the VOTES of each member.

Wim
Wim
YJTAN
Occasional Advisor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Each contribute 1 vote, with EXPECTED_VOTE of 7.
Wim Van den Wyngaert
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Could you confirm that all nodes have SYSGEN parameter VOTES on 1 and EXPECTED_VOTES on 7 ? If they are on 1 and 7, please post sysgen show/all.

I'm not used to such clusters but I would say that you have a bug if 1/7 is true. After 4 members the cluster should be alive.

Find it strange that the expected votes in your output jumped from 7 to 5. I assumed that this was caused by the maximum expected votes present descreasing.

But I vaguely remember that removing a node does not alter the expected votes unless you use the shutdown option REMOVE_NODE. Did you (in all cases ?) ?

Wim
Wim
YJTAN
Occasional Advisor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Yes. All the shutdowns were done with REMOVE_NODE option.

There are 3 system disks in this cluster.
The GS1280s share 1. The RX8620s share 1 and The Quorum Server (RX2600) has it's own one.

I will attach the SYSGEN SHOW/ALL when I have the chance to access the systems.
Volker Halle
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

YJTAN,

I've not done all the math for your scenario, but consider this:

If the remaining node HSQ101 just has a dynamic value of EXPECTED_VOTES=1 (because you've shut down all other nodes with REMOVE_NODE) and you now boot a member with EXPCTED_VOTES=7, it cannot join the cluster. It will see HSQ101 and report it on the console after a while, but it will not be allowed to join, because the cluster would then immediately loose quorum. You should have been able to boot the other nodes conversatoinal and change EXPECTED_VOTES to just the 'right' value at that time...

Just a thought,

Volker.
Volker Halle
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

re: I will attach the SYSGEN SHOW/ALL when I have the chance to access the systems.

$ MC SYSGEN SHOW/CLUSTER should be sufficient for this problem.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

I think the remove_node option sets the quorum to the number of votes. And it recalculates expected_votes based upon this.

Until step 15 it's fine.

But then when you reboot, the cluster should reject the node until 4 nodes are present. Which it doesn't. Bug or caused by very bad params I would say.

In step 22 you reboot the Q node (don't know why you call it quorum node). The remaining nodes take over the quorum calculation and this time correct.

Wim

Wim
Volker Halle
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

The IMPORTANT thing here is: the remaining nodes formed a NEW cluster, once HSQ101 was shut down! They did NOT join the old one !

You can see this in the Formation date going from 11:19 to 13:45 !

I could imagine that nodes can only join the cluster one at a time. Which in this case does not work due to the EXPECTED_VOTES settings of the new nodes.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Would ho for show/all because pap params are not in /cluster.

Know nothing but could papollinterval be too high ?

Wim
Wim
Volker Halle
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Wim,

cluster communications etc. is fine, as can be seen by the

%CNXMAN, Have connection to system xxx

messages. The systems clearly see each other, but the current cluster coordinator node (HSQ101) would not let the other nodes in.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

As said, I knew nothing. Would be nice to have a playcluster 7.

Wim
Wim
YJTAN
Occasional Advisor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Attached is the file of SYSGEN SHOW/ALL of all the systems.

It's a 3 sites cluster.

Site A has HSP101,HSP103,HSP105

Site B has HSP102,HSP104,HSP106

Site C has HSQ101.

Site A and Site B are 25KM apart, communicated via DWDM, with redundant SCA circuits.

Site C is about 200 meters away from Site B, in a seperate building, sharing the same DWDM with Site B for communication to Site A. ( not really a true 3-site cluster )

Application runs on all nodes but HSQ101. HSQ101 main function is just to serve 1 vote to the clulster. That is why we called it Quorum Server.
Volker Halle
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

YJTAN,

all the systems have EXPECTED_VOTES=1 and VOTES=1. These are the correct values for cluster operation for this configuration.

If you carefully read Chapter 7.11 Cluster State Transition flows - JOIN CLUSTER in the 'VAXcluster Principles' book by Roy G. Davis, you may come to the conclusions that what you tried to do does not work and it's not supposed to.

During shutdown with REMOVE_NODE you've told the remaining nodes in the cluster to adjust the value of expected votes and therefore quorum. Your cluster was reduced to a single node cluster (HSQ101) with expected_votes=1 and quorum=1.

Then you tried to add NEW nodes to the EXISTING cluster, which could not be let in because their EXPECTED_VOTES (=7) setting would have caused quorum to be immediately lost, if they would have been admitted. The crucial point here seems to be, that JOIN CLUSTER state transition into a cluster with existing quorum would only process ONE node at a time. The FORM CLUSTER transition seems to take into account all reachable systems at the same time - as you've seen after shutting down HQS101 and a NEW cluster had been formed.

To allow the nodes to re-join the existing 1-node clsuter, you would have to had booted the nodes conversational with a reduced EXPECTED_VOTES setting.

Consider to continue this discussion with a cluster expert at HP.

Volker.
Volker Halle
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

YJTAN,

sorry for the typo:

All the system have the correct settings of EXPECTED_VOTES=7 (not 1) and VOTES=1.

The crucial fact is, that the 'existing' one-node cluster (HSQ101) does HAVE quorum. So it does NOT take into account the other systems also wanting to join. TIncluding the others would only happen, if HSQ101 would NOT have had quorum.

As HSQ101 has qourum, it only selects the system sending the join request (i.e. the current cluster members = HSQ101 and the new system) to participate for the proposed state transition. This will fail due to the high numer of expected_votes of the new system.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Next time yoy have 1 node left, use ana/sys show cluster. This shows more details about the new node. May be there is something in it.

fwiw

Wim
Wim