Operating System - OpenVMS
Showing results for 
Search instead for 
Did you mean: 

cluster error

Frequent Advisor

cluster error

i have 2 nodes on cluster, wb3 and wb4. node wb3 is working fine but when i plug the network cable into wb4, i get the following error on wb4.

bugcheck code = 000005DC :cluexit, node voluntarily existing vms cluster.

please advise what would be wrong

Andy Bustamante
Honored Contributor

Re: cluster error

What are the VOTES and EXPECTED_VOTES on these nodes? Is there a quorum disk configured? Is there any other cluster inter connect available?

Assuming you're able to boot both nodes without networking, this probably means there is no quorum disk and expected votes isn't appropriately configured. If this is the case:

When you boot the two nodes without network connectivity and with votes allowing this configuration, each node will form an instance of the cluster. You have two instances of the same cluster running. When you connect the networking cable and the nodes see each other, one node opts to to exit the cluster gracefully.

Solution, plug in the networking cables and boot the second node.

Long term solution. Review VOTES and EXPECTED_VOTES. What storage is in place, a quorum disk is generally used with two nodes in a cluster to allow the cluster to continue with only one node available. With two clustered nodes, a cross over cable in a second network interface allows for uninterrupted cluster communications when networking support wants to upgrade their switch firmware or test redundant power supplies. . .

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Andy Bustamante
Honored Contributor

Re: cluster error

If you have any sort of shared storage DO NOT BOOT THE SECOND NODE without making sure cluster communications are available. The result may be an opportunity to review your backup restore procedure.

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Robert Gezelter
Honored Contributor

Re: cluster error


I concur with Andy. The original post does not mention if the storage is shared between the systems or if the storage is local.

If the storage is shared, booting two non-communicating nodes will likely lead to corruption of the data stored on the disks. DO NOT boot cluster members if their communications to the cluster is disconnected, but the connection to shared storage is working. It is a recipe for severe problems.

- Bob Gezelter, http://www.rlgsc.com
Steve Reece_3
Trusted Contributor

Re: cluster error

Have the two nodes ever worked together successfully? Have the two nodes ever booted together?

You don't mention what hardware the systems are.

My guess (and it is a guess since I've no idea on the environment, similar to that of the disk device name question that you had) is that the two nodes want to be part of the same cluster but wb3 and wb4 have booted independently and you're now seeking to join them into the same cluster from a fully booted state. This won't work.

VMSclusters rely on shared views of data structures across all of the nodes in the cluster. If a node goes away for a period of time and then seeks to rejoin, it will crash out and rejoin the cluster from a known state (i.e. rebooting) so that it can build its views of the shared data structures.
Similarly, if you boot two nodes for the same VMScluster separately and then try to bring the two booted nodes together, one of them will crash with a CLUEXIT and will reboot to join the cluster from a known state of being rebooted.

In other words, if both nodes are booted and you plug the network cable into wb4 then I would expect one of them to crash. It's expected behaviour.

You need to ensure that the two nodes do not share storage with them booting separately. This would corrupt the disks very quickly. You also need to ensure that the systems, if they boot from the same disk, boot from separate directory trees.

if you have had both nodes running with shared storage mounted on both then you're likely to have to get a backup tape out and restore all of the data disks...