Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

 
Volker Halle
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

The IMPORTANT thing here is: the remaining nodes formed a NEW cluster, once HSQ101 was shut down! They did NOT join the old one !

You can see this in the Formation date going from 11:19 to 13:45 !

I could imagine that nodes can only join the cluster one at a time. Which in this case does not work due to the EXPECTED_VOTES settings of the new nodes.

Volker.
Highlighted
Wim Van den Wyngaert
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Would ho for show/all because pap params are not in /cluster.

Know nothing but could papollinterval be too high ?

Wim
Wim
Volker Halle
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Wim,

cluster communications etc. is fine, as can be seen by the

%CNXMAN, Have connection to system xxx

messages. The systems clearly see each other, but the current cluster coordinator node (HSQ101) would not let the other nodes in.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

As said, I knew nothing. Would be nice to have a playcluster 7.

Wim
Wim
YJTAN
Occasional Advisor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Attached is the file of SYSGEN SHOW/ALL of all the systems.

It's a 3 sites cluster.

Site A has HSP101,HSP103,HSP105

Site B has HSP102,HSP104,HSP106

Site C has HSQ101.

Site A and Site B are 25KM apart, communicated via DWDM, with redundant SCA circuits.

Site C is about 200 meters away from Site B, in a seperate building, sharing the same DWDM with Site B for communication to Site A. ( not really a true 3-site cluster )

Application runs on all nodes but HSQ101. HSQ101 main function is just to serve 1 vote to the clulster. That is why we called it Quorum Server.
Volker Halle
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

YJTAN,

all the systems have EXPECTED_VOTES=1 and VOTES=1. These are the correct values for cluster operation for this configuration.

If you carefully read Chapter 7.11 Cluster State Transition flows - JOIN CLUSTER in the 'VAXcluster Principles' book by Roy G. Davis, you may come to the conclusions that what you tried to do does not work and it's not supposed to.

During shutdown with REMOVE_NODE you've told the remaining nodes in the cluster to adjust the value of expected votes and therefore quorum. Your cluster was reduced to a single node cluster (HSQ101) with expected_votes=1 and quorum=1.

Then you tried to add NEW nodes to the EXISTING cluster, which could not be let in because their EXPECTED_VOTES (=7) setting would have caused quorum to be immediately lost, if they would have been admitted. The crucial point here seems to be, that JOIN CLUSTER state transition into a cluster with existing quorum would only process ONE node at a time. The FORM CLUSTER transition seems to take into account all reachable systems at the same time - as you've seen after shutting down HQS101 and a NEW cluster had been formed.

To allow the nodes to re-join the existing 1-node clsuter, you would have to had booted the nodes conversational with a reduced EXPECTED_VOTES setting.

Consider to continue this discussion with a cluster expert at HP.

Volker.
Volker Halle
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

YJTAN,

sorry for the typo:

All the system have the correct settings of EXPECTED_VOTES=7 (not 1) and VOTES=1.

The crucial fact is, that the 'existing' one-node cluster (HSQ101) does HAVE quorum. So it does NOT take into account the other systems also wanting to join. TIncluding the others would only happen, if HSQ101 would NOT have had quorum.

As HSQ101 has qourum, it only selects the system sending the join request (i.e. the current cluster members = HSQ101 and the new system) to participate for the proposed state transition. This will fail due to the high numer of expected_votes of the new system.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: EXPECTED_VOTES, CL_EXP, Shutdown and Reboot sequence

Next time yoy have 1 node left, use ana/sys show cluster. This shows more details about the new node. May be there is something in it.

fwiw

Wim
Wim