Operating System - HP-UX
1830006 Members
2855 Online
109998 Solutions
New Discussion

Serviceguard not forming the cluster at startup

 
SOLVED
Go to solution
Ignacio Javier
Regular Advisor

Serviceguard not forming the cluster at startup


Hello everybody:

I have a two node hpux 11.23 SG11.16 cluster.
If the cluster is halted and i reboot a node, it is not able to form the cluster:

secundar,/>Feb 13 10:21:01 secundar cmcld: Cluster formation failed
Feb 13 10:21:01 secundar cmcld: Reason: Ran out of time for automatically joining a cluster
Feb 13 10:21:01 secundar cmcld: Attempting to form a new cluster
Feb 13 10:21:01 secundar above message repeats 28 times
Feb 13 10:21:01 secundar cmcld: Unable to contact all nodes in the cluster, thus it is not
Feb 13 10:21:01 secundar cmcld: Beginning standard election
Feb 13 10:21:01 secundar above message repeats 28 times
Feb 13 10:21:01 secundar cmcld: possible to join the cluster at this time.
Feb 13 10:21:01 secundar cmcld: If the cluster is not running, use the cmruncl command to
Feb 13 10:21:01 secundar cmcld: start it. If the cluster is running on other nodes, verify
Feb 13 10:21:01 secundar cmcld: this node's ability to send messages to the other nodes,
Feb 13 10:21:01 secundar cmcld: then re-issue the cmrunnode command.
Feb 13 10:21:01 secundar cmlvmd[1952]: The cluster daemon aborted our connection.
Feb 13 10:21:01 secundar cmlvmd[1952]: Could not read messages from /usr/lbin/cmcld: Software caused connection abort
Feb 13 10:21:01 secundar cmlvmd[1952]: CLVMD exiting
Feb 13 10:21:01 secundar cmcld: This node (secundar) has ceased cluster activities.
Feb 13 10:21:02 secundar cmcld: Daemon exiting
Feb 13 10:21:01 secundar cmlvmd[1952]: Could not read messages from /usr/lbin/cmcld: Software caused connection abort
Feb 13 10:21:01 secundar cmsrvassistd[1950]: Service assistant daemon halted


Just when this message is showed, i make a cmruncl command and the cluster starts fine.

What do you think it may happens ? I makes no sense to me that if it tries to run automatically does not work but if i do it manually it does

Regards

4 REPLIES 4
Wouter Jagers
Honored Contributor

Re: Serviceguard not forming the cluster at startup

If you only want to reboot one node, I think you'd be better off using the "cmhaltnode" command instead of halting the whole cluster.

The following command double checks your cluster config file (including valid timeout values):
# cmcheckconf -k -v -C

The file /etc/rc.config.d/cmcluster should contain the line:
AUTOSTART_CMCLD=1
on -every- node of the cluster for automatic startup.

Let us know what you find :)

Cheers
an engineer's aim in a discussion is not to persuade, but to clarify.
Steven E. Protter
Exalted Contributor

Re: Serviceguard not forming the cluster at startup

Shalom,

I agree its not correct to stop the whole cluster to reboot a node.

A properly configured cluster should let you reboot one node without taking any action.

Packages will fail over to the non-booted node without any other action. This is kind of the point of clustering.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Eric SAUBIGNAC
Honored Contributor

Re: Serviceguard not forming the cluster at startup

Hi Ignacio,

that is a normal behavior of cluster formation.

At startup a node try to join an existing cluster (cmrunnode), not to start the whole cluster (cmruncl).

If no cluster can be found the node wait an amount of time defined in cluster configuration file (default is 10 mns). During this period if other nodes come up and the quorum is reached, then the cluster can be created and started. If not, no cluster is created.

So if you need to restart the whole cluster with only one node, it can not be done automatically. You can do it with the following command on the node which is alive :

cmruncl -f -n

Hope this will help

Regards

Eric

Stephen Doud
Honored Contributor
Solution

Re: Serviceguard not forming the cluster at startup

A look at the boot-time script that is used to start the Serviceguard subsystem will reveal that the /sbin/init.d/cmcluster script uses 'cmrunnode -v' vs. 'cmruncl -v'.
The difference between these commands are as follows:

cmrunnode will
a) cause a node to join a running cluster
b) cause a node to form a cluster WHEN ALL OTHER NODES ARE ALSO PERFORMING cmrunnode within the AUTOSTART_TIMEOUT set in the cluster binary file (default is 10 minutes).

cmruncl will
a) start a cluster WHEN all nodes are available to start a cluster
b) will start a one-node cluster when the -n option is used.

Note cmrunnode item b).... rebooting a node is not sufficient to cause a cluster to form. The other member nodes of the cluster must also be performing cmrunnode within a particular window, or the cluster will not form. If you inspect syslog.log for the time delay between the cmrunnode and the message:
cmcld: Cluster formation failed
Feb 13 10:21:01 secundar cmcld: Reason: Ran out of time for automatically joining a cluster
... you will see that the delay is 10 minutes, confirming the issue - not all nodes were running cmrunnode within the AUTOSTART_TIMEOUT window.

Therefore, you can do one of the following:
1) do not reboot the whole cluster, when only one node needs to reboot. Allow one member node to continue to operate the cluster, and other rebooted nodes will join that cluster

2) if you reboot all nodes in the cluster and you want them to form a cluster during boot time, make certain they all begin the '/sbin/init.d/cmcluster' script within 10 minutes of one another. Also, insure that AUTOSTART_CMCLD=1 in /etc/rc.config.d/cmcluster

3) When you get a login prompt, kill the cmrunnode and then perform cmruncl if all nodes are ready to respond. If not, use 'cmruncl -n -f' to cause Serviceguard to form a one-node cluster