Operating System - OpenVMS
1828836 Members
2179 Online
109985 Solutions
New Discussion

Re: Cluster reboot troubles ...

 
SOLVED
Go to solution
Art Wiens
Respected Contributor

Cluster reboot troubles ...

This weekend I rebooted my mixed mode (4 Alpha v7.2-2, 4 VAX v6.2) cluster after ~ 1 year uptime (we had a power issue last December in our datacenter). Over the years I have rebooted the cluster several times, always in the same order, 4 Alpha's in a particular order and then the 4 VAX's also in a particular order ... never an issue.

This time, when the third Alpha tried to join the cluster, it had a problem. NODE3 displayed the "sending membership request to NODE1" message repeatedly. On NODE1, it displayed the fact that it received the membership request from NODE3 and that it was completing VMScluster state transition also repeatedly.

I dropped NODE3 and tried it again, same issue.

NODE1 and NODE2 were up, I brought them down, powered them all down/up and tried again, same issue!

Nothing was physically done after the initial cluster shutdown (ie. no cables were moved) but I checked the ethernet and fiber connections anyways...found nothing.

Once more I tried, same issue, I booted NODE4 while NODE3 was still sending membership requests and NODE1 saying state transition, NODE4 reacted normally and within a few seconds NODE3 came up as well!!

While in the "failing state", on NODE1 or NODE2, a SHOW CLUSTER/CONTINUOUS showed NODE3 in a "New". No error messages were being displayed on any consoles, just membership requests and VMScluster state transitions.

Nothing unusual in the OPERATOR.LOG files. WTF happened?

Cheers,
Art
2 REPLIES 2
Uwe Zessin
Honored Contributor
Solution

Re: Cluster reboot troubles ...

Can you check the EXPECTED_VOTES values? If the value is too high, a node cannot join an existing and running cluster.
.
Art Wiens
Respected Contributor

Re: Cluster reboot troubles ...

Correct!! How the h*ll did that get changed?! I'm 100% positive this worked last December and nothing should have changed with regard to votes etc. and I'm quite certain no Autogen was done.

Mystery solved, thanks Uwe!

Now to find the culprit,
Art