Operating System - HP-UX
1827160 Members
2114 Online
109716 Solutions
New Discussion

URGENT : Unable to Form a Cluster

 
Dayanand Naik_1
Occasional Advisor

URGENT : Unable to Form a Cluster

Hi Folks,

This message was logged in the syslog file : Cluster Formation Failed. While forming the cluster.

Our setup is as 2 nodes and a pkg configured, the systems went down due to a/c failure and when it came back, neither of the 2 nodes were able to form a cluster. There is no problems with the network nor with the heartbeat connection. possibility could be changes been made to the vg or cluster configuration where it is not able to get the lock. Can any1 let me know how to rectify the same. Suggestion welcomed.

Following error was logged on the syslog file

Jun 09 12:35:19 csora1 CM-CMD[1181]: /usr/sbin/cmrunnode -v
Jun 09 12:35:35 csora1 cmclconfd[1307]: Executing "/usr/lbin/cmcld" for node csora1
Jun 09 12:35:36 csora1 cmcld: Daemon Initialization - Maximum number of packages supported for this incarnation is 5.
Jun 09 12:35:36 csora1 cmcld: Reserving 1748 Kbytes of memory and 49 threads
Jun 09 12:35:36 csora1 cmcld: The maximum # of concurrent local connections to the daemon that will be supported is 20.
Jun 09 12:35:37 csora1 cmcld: cmgmsd_init: SG CLUSTER ; return 1
Jun 09 12:35:37 csora1 cmcld: Starting cluster management protocols.
Jun 09 12:35:37 csora1 cmcld: Attempting to form a new cluster
Jun 09 12:35:37 csora1 cmtaped[1345]: cmtaped: There are no ATS devices on this cluster.
Jun 09 12:45:37 csora1 cmcld: Cluster formation failed
Jun 09 12:45:37 csora1 cmcld: Reason: Ran out of time for automatically joining a cluster
Jun 09 12:45:37 csora1 cmcld: Unable to contact all nodes in the cluster, thus it is not
Jun 09 12:45:33 csora1 cmcld: Attempting to form a new cluster
Jun 09 12:45:37 csora1 above message repeats 91 times
Jun 09 12:45:37 csora1 cmcld: possible to join the cluster at this time.
Jun 09 12:45:37 csora1 cmlvmd: Could not read messages from /usr/lbin/cmcld: Software caused connection abort
Jun 09 12:45:37 csora1 cmlvmd: CLVMD exiting
Jun 09 12:45:37 csora1 cmsrvassistd[1340]: Lost connection to the cluster daemon.
Jun 09 12:45:37 csora1 cmsrvassistd[1340]: Lost connection with ServiceGuard cluster daemon (cmcld): Software caused connection abort
Jun 09 12:45:37 csora1 cmcld: If the cluster is not running, use the cmruncl command to
Jun 09 12:45:37 csora1 cmcld: start it. If the cluster is running on other nodes, verify
Jun 09 12:45:37 csora1 cmcld: this node's ability to send messages to the other nodes,
Jun 09 12:45:37 csora1 cmcld: then re-issue the cmrunnode command.
Jun 09 12:45:37 csora1 cmtaped[1345]: Lost connection to the cluster daemon.
Jun 09 12:45:37 csora1 cmtaped[1345]: cmtaped terminating. (ATS 1.14)


Regards,
!!! Naik !!!
4 REPLIES 4
Tim D Fulford
Honored Contributor

Re: URGENT : Unable to Form a Cluster

Hi

I would check out the following...

1 - cmquerycl on BOTH nodes.

2 - are the messages the same on both machines in the syslog?

3 - Can you activate your VG's independant of SG? make sure you issue -a e not -a y
# vgchange -a e -q n
If you can't then try activating them using -a y (if this is the case the VG is not a cluster vg & so the relavent line in pkg.sh will not work). If you can't activate them at all this could be the problem.

4 - It is also possible for the cluster lock disk to loose it's tag/id. If it has thre are two ways to restore it
o vgcfgrestore .... (if the /etc/lvmconf/... file is OK)
o cmapplyconf ...

Tim
-

Re: URGENT : Unable to Form a Cluster

Full cluster membership is required for initial cluster formation. If you have both boxes up and running then just try a 'cmruncl -v' on either, do you still get the messages above?

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Stephen Doud
Honored Contributor

Re: URGENT : Unable to Form a Cluster

Hello Naik,

You might find this document helpful:
UXSGLVKBAN00000009
TITLE: ServiceGuard Cluster Formation at Boot Time

Your syslog shows the command "/usr/sbin/cmrunnode -v " was executed - most likely from the reboot /sbin/init.d/cmcluster boot-time script.
As Duncan alluded to, all nodes must be present and either attempting to form a cluster, or already in a cluster, to allow a booting node to form/join a cluster. Your syslog reports that after 10 minutes the cluster formation failed - reason: ran out of time:

Jun 09 12:45:37 csora1 cmcld: Reason: Ran out of time for automatically joining a cluster

As the syslog.log also says:


Jun 09 12:45:37 csora1 cmcld: If the cluster is not running, use the cmruncl command to
Jun 09 12:45:37 csora1 cmcld: start it. If the cluster is running on other nodes, verify
Jun 09 12:45:37 csora1 cmcld: this node's ability to send messages to the other nodes,
Jun 09 12:45:37 csora1 cmcld: then re-issue the cmrunnode command.

To get less-than-all-nodes to form a cluster, use this syntax:

# cmruncl -n -n ...
(Useful if one of the cluster nodes is not able to join the cluster)

Or as Duncan suggested, if all nodes are up, just try a manual cluster startup:

# cmruncl

-s.
Sanjay_6
Honored Contributor

Re: URGENT : Unable to Form a Cluster

Hi Naik,

Try to manually form a cluster when both the system are up.

cmruncl -v

If one of the systems is down form the cluster with only one node,

cmruncl -f -v -n node_name

Hope this helps.

Regds