Operating System - HP-UX
1833832 Members
2201 Online
110063 Solutions
New Discussion

In 3 node cluster one node is preventing the cluster from forming.

 
SOLVED
Go to solution
Shawn Miller_2
Frequent Advisor

In 3 node cluster one node is preventing the cluster from forming.

I have a 3 node cluster. With a package the fails over from nodeA to nodeB. In the near future I will have a second package that will failover from nodeC to nodeB. We had to shut our power down and when everything was turned back on nodeC's hearbeat was down. All the other nodes stated can not communicate with all nodes so not forming the cluster. This is not good and I assumed that if some nodes are not reachable the cluster should just form without it. Then I can fix that node and issue a command to have join. Why did the cluster not form even though 2 out of 3 nodes were available and how can I make sure that it will in the future.
3 REPLIES 3
Jeff Schussele
Honored Contributor

Re: In 3 node cluster one node is preventing the cluster from forming.

Hi Shawn,

Do you have a lock disk specified? In the event that one node is unavailable - leaving two - you need a "tie-breaker" so that one node can take charge. This is what a lock disk does.

In the cluster ascii file they would be designated thusly:

FIRST_CLUSTER_LOCK_VG /dev/vgHDS02
FIRST_CLUSTER_LOCK_PV /dev/dsk/c7t0d3

HTH,
Jeff


PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Shawn Miller_2
Frequent Advisor

Re: In 3 node cluster one node is preventing the cluster from forming.

Yes I have a lock disk configured. Last year that was configured wrong and I error messages to the effect of the lock disk.
Here is a chunk of my syslog:

cmcld: Attempting to form a new cluster
Oct 23 04:00:00 miux01 above message repeats 31 times
Oct 23 04:00:03 miux01 cmcld: Attempting to form a new cluster
Oct 23 04:00:26 miux01 cmcld: Cluster formation failed
Oct 23 04:00:26 miux01 cmcld: Reason: Ran out of time for automatically joining a cluster
Oct 23 04:00:26 miux01 cmcld: Unable to contact all nodes in the cluster, thus it is not
Oct 23 04:00:23 miux01 cmcld: Attempting to form a new cluster
Oct 23 04:00:26 miux01 above message repeats 3 times
Oct 23 04:00:26 miux01 cmcld: possible to join the cluster at this time.
Oct 23 04:00:26 miux01 cmtaped[1990]: The cluster daemon aborted our connection.
Oct 23 04:00:26 miux01 cmtaped[1990]: cmtaped terminating. (ATS 1.14)
Oct 23 04:00:26 miux01 cmsrvassistd[1976]: The cluster daemon aborted our connection.
Oct 23 04:00:26 miux01 cmlvmd: Could not read messages from /usr/lbin/cmcld: Software caused connection abort
Oct 23 04:00:26 miux01 cmclconfd[4172]: The cluster daemon aborted our connection.
Oct 23 04:00:26 miux01 cmclconfd[4172]: Unable to lookup any node information in CDB: Software caused connection abort
Solution

Re: In 3 node cluster one node is preventing the cluster from forming.

Shawn,

The behaviour you are seeing from ServiceGuard is correct - you just have to step back and think a minute about what is happening.

The absolute most important thing to ServiceGuard isn't keeping your packages up, its protecting the data that lives in the packages. In the situation you describe (with the cluster halted), if nodeC can't talk to nodeA and nodeB and, then when the cluster is trying to start on nodeA and nodeB, for all they know nodeC is also trying to run the cluster (or perhaps nodeC never halted and is still running the cluster!) If they start the cluster and run the packages in it, this could result in data corruption if the packages are activated on more than one node simultaneously. The point is that without knowing the initial state of all nodes, a cluster cannot form.

Of course you can walk to nodeC login to it and check manually that it's not running the cluster. Once your happy that its not, you could restart the cluster on just a subset of the nodes as follows:

cmruncl -n nodeA -n nodeB

You will get a warning message and be asked to confirm what your are doing, but then the cluster will form. Now you can fix your network problem and then bring nodeC into the cluster using cmrunnode:

cmrunnode nodeC

You should review this section of the ServiceGuard manual for more on this subject:

http://docs.hp.com/hpux/onlinedocs/B3936-90073/B3936-90073.html

HTH

Duncan

I am an HPE Employee
Accept or Kudo