Operating System - HP-UX
1833451 Members
2989 Online
110052 Solutions
New Discussion

Re: Cluster did not form....

 
SOLVED
Go to solution
Daniel Sanabria
Advisor

Cluster did not form....

Hi,
I'm totally new to ServiceGuard. I'm trying to implement a two nodes cluster and everything looks ok until I try to start the cluster with the cmruncl command.

# cmcheckconf -C /etc/cmcluster/cmclconfig.ascii

Begin cluster verification...
Adding node SVR1_CL to cluster nfscl1.
Adding node SVR2_CL to cluster nfscl1.

Verification completed with no errors found.
Use the cmapplyconf command to apply the configuration.
# cmapplyconf -C /etc/cmcluster/cmclconfig.ascii

Begin cluster verification...
Adding node SVR1_CL to cluster nfscl1.
Adding node SVR2_CL to cluster nfscl1.
Completed the cluster creation.

# cmruncl
cmruncl : Waiting for cluster to form........................
cmruncl : Cluster did not form. Check the syslog file for information.
/var/adm/syslog/syslog.log

When I check syslog.log the reason of the failure is not clear to me:

Jul 29 12:43:00 SVR1_CL cmcld: Global Cluster Information:
Jul 29 12:43:00 SVR1_CL cmcld: Heartbeat Interval is 1 seconds.
Jul 29 12:43:00 SVR1_CL cmcld: Node Timeout is 5 seconds.
Jul 29 12:43:00 SVR1_CL cmcld: Network Polling Interval is 2 seconds.
Jul 29 12:43:00 SVR1_CL cmcld: Auto Start Timeout is 1800 seconds.
Jul 29 12:43:00 SVR1_CL cmcld: Information Specific to node SVR1_CL:
Jul 29 12:43:00 SVR1_CL cmcld: Cluster lock disk: /dev/dsk/c0t0d2.
Jul 29 12:43:00 SVR1_CL cmcld: lan901 0x001279fe478a 192.168.3.1 bridged net
:1
Jul 29 12:43:00 SVR1_CL cmcld: lan902 0x001279fe4789 192.168.1.21 bridged ne
t:2
Jul 29 12:43:00 SVR1_CL cmcld: lan903 0x001279fe4788 192.168.2.21 bridged ne
t:3
Jul 29 12:43:00 SVR1_CL cmcld: Heartbeat Subnet: 192.168.3.0
Jul 29 12:43:00 SVR1_CL cmcld: Heartbeat Subnet: 192.168.1.0
Jul 29 12:43:00 SVR1_CL cmcld: Heartbeat Subnet: 192.168.2.0
Jul 29 12:43:00 SVR1_CL cmcld: The maximum # of concurrent local connections to
the daemon that will be supported is 979.
Jul 29 12:43:00 SVR1_CL cmcld: rcomm health: Initializing timeout to 120000000
microseconds
Jul 29 12:43:00 SVR1_CL cmcld: Total allocated: 2609696 bytes, used: 4199976 by
tes, unused 2337784 bytes
Jul 29 12:43:00 SVR1_CL cmcld: Starting cluster management protocols.
Jul 29 12:43:00 SVR1_CL cmcld: Attempting to form a new cluster
Jul 29 12:43:01 SVR1_CL cmtaped[10184]: cmtaped: Kernel tuneable st_ats_enabled
disabled for ATS.
Jul 29 12:44:00 SVR1_CL cmcld: Cluster formation failed
Jul 29 12:44:00 SVR1_CL cmcld: Reason: Ran out of time for manually starting th
e cluster
Jul 29 12:44:00 SVR1_CL cmcld: This node (SVR1_CL) has ceased cluster activiti
es.
Jul 29 12:43:59 SVR1_CL cmcld: Attempting to form a new cluster
Jul 29 12:44:00 SVR1_CL above message repeats 9 times
Jul 29 12:44:00 SVR1_CL cmcld: Daemon exiting


What I'm doing wrong here?

Any help is well appreciated
12 REPLIES 12
RAC_1
Honored Contributor
Solution

Re: Cluster did not form....

run cmquerycl on both nodes and check if it reports any error or not.
There is no substitute to HARDWORK
nanan
Trusted Contributor

Re: Cluster did not form....

The first
Try to run one node cluster
cmruncl -f -n "node" and you can get more information to solve the problem.


melvyn burnard
Honored Contributor

Re: Cluster did not form....

What did the log on the other node say about trying to start a cluster?
I also note your Auto Start Timeout is 1800 seconds, is there any real ereason to change it to be this long? Normally this is adequate at 600 seconds.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Daniel Sanabria
Advisor

Re: Cluster did not form....

Thanks for the replies guys,

Checking the logs in the other node has shed some light but still don't know what to check. The permissions of the cmclconfig file are the same for both nodes (rw for the owner root in this case).

Also noticed that when doing cmquerycl on the second node the nfscl1 cluster is not listed.

Something fundamental is wrong here but I don't know what it's.


/var/adm/syslog/syslog.log on SVR2:

Jul 29 12:43:18 SVR2_CL cmclconfd[9217]: Request from root on node SVR1_CL to
start the cluster on this node.
Jul 29 12:43:18 SVR2_CL cmclconfd[9217]: Executing "/usr/lbin/cmcld" for node P
PR1_CL
Jul 29 12:43:19 SVR2_CL cmcld: Permission denied to 127.0.0.1
Jul 29 12:44:19 SVR2_CL cmcld: Either there is no configuration data or cmclcon
fd is unable to run.
Jul 29 12:44:25 SVR2_CL cmclconfd[9217]: The ServiceGuard daemon, /usr/lbin/cmc
ld[9218], exited with a status of 1.

# cmruncl -f -n SVR2_CL

Permission denied to 127.0.0.1
cmruncl : Unable to determine the nodes on the current cluster
cmruncl : Either no cluster configuration file exists, or the file is corrupted, or cmclconfd is unable to run

# cmquerycl (SVR2)


Cluster Name Node Name
UNUSED
34401_CL
34402_CL
34403_CL
34404_CL
34405_CL

# cmquerycl (SVR1)


Cluster Name Node Name
UNUSED
34401_CL
34402_CL
34403_CL
34404_CL
34405_CL

nfscl1
SVR1_CL
SVR2_CL


Daniel Sanabria
Advisor

Re: Cluster did not form....

BTW the timeout was part of the troubleshooting that I was doing at the beginning but will be reduced once I figure out this problem.
melvyn burnard
Honored Contributor

Re: Cluster did not form....

seems to me there are issues either with the security, or something else weird, perhaps one of the nodes used ot belong to anohter cluster?
run cmscancl on both nodes and compare the output files.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Daniel Sanabria
Advisor

Re: Cluster did not form....

Hi Melvin,

Below the output of the command ran on both nodes:

# cmscancl (from SVR2_CL)
cmviewconf: Binary file does not exist.

cmscancl: Unable to obtain node names from the cmviewconf command.


# cmscancl (from SVR1_CL)

The nodes to be scanned are: PPR1_STC PPR2_STC

The output file is: /tmp/scancl.out

Checking remsh access to all the nodes...
Getting information from SVR1_CL...
Getting information from SVR2_CL...
Checking remote network connections (SVR1_CL to SVR2_CL)...

Done. Check log file /tmp/scancl.out for all information.

What is wrong with ServiceGuard in the second node?
nanan
Trusted Contributor

Re: Cluster did not form....

1. check /etc/cmcluster/cmclnodelist
2. check /etc/inetd.conf
show me the contents of the file
Daniel Sanabria
Advisor

Re: Cluster did not form....

Now I'm worried,

/etc/cmcluster/cmclnodelist is not present in any of the two nodes.

Below the contents of /etc/inetd.conf in SVR2

# cat /etc/inetd.conf
## Configured using SAM by root on Mon Nov 21 15:33:56 2005
## Configured using SAM by root on Mon Nov 21 15:34:01 2005
##
#
# @(#)B.11.11_LRinetd.conf $Revision: 1.24.214.3 $ $Date: 97/09/10 14:50:49 $
#
# Inetd reads its configuration information from this file upon execution
# and at some later time if it is reconfigured.
#
# A line in the configuration file has the following fields separated by
# tabs and/or spaces:
#
# service name as in /etc/services
# socket type either "stream" or "dgram"
# protocol as in /etc/protocols
# wait/nowait only applies to datagram sockets, stream
# sockets should specify nowait
# user name of user as whom the server should run
# server program absolute pathname for the server inetd will
# execute
# server program args. arguments server program uses as they normally
# are starting with argv[0] which is the name of
# the server.
#
# See the inetd.conf(4) manual page for more information.
##

##
#
# ARPA/Berkeley services
#
##
ftp stream tcp nowait root /usr/lbin/ftpd ftpd -l
telnet stream tcp nowait root /usr/lbin/telnetd telnetd

# Before uncommenting the "tftp" entry below, please make sure
# that you have a "tftp" user in /etc/passwd. If you don't
# have one, please consult the tftpd(1M) manual entry for
# information about setting up this service.

tftp dgram udp wait root /usr/lbin/tftpd tftpd\
/opt/ignite\
/var/opt/ignite
#bootps dgram udp wait root /usr/lbin/bootpd bootpd
#finger stream tcp nowait bin /usr/lbin/fingerd fingerd
login stream tcp nowait root /usr/lbin/rlogind rlogind
shell stream tcp nowait root /usr/lbin/remshd remshd
exec stream tcp nowait root /usr/lbin/rexecd rexecd
#uucp stream tcp nowait root /usr/sbin/uucpd uucpd
ntalk dgram udp wait root /usr/lbin/ntalkd ntalkd
ident stream tcp wait bin /usr/lbin/identd identd

##
#
# Other HP-UX network services
#
##
printer stream tcp nowait root /usr/sbin/rlpdaemon rlpdaemon -i

##
#
# inetd internal services
#
##
daytime stream tcp nowait root internal
daytime dgram udp nowait root internal
time stream tcp nowait root internal
#time dgram udp nowait root internal
echo stream tcp nowait root internal
echo dgram udp nowait root internal
discard stream tcp nowait root internal
discard dgram udp nowait root internal
chargen stream tcp nowait root internal
chargen dgram udp nowait root internal

##
#
# rpc services, registered by inetd with portmap
# Do not uncomment these unless your system is running portmap!
#
##
# WARNING: The rpc.mountd should now be started from a startup script.
# Please enable the mountd startup script to start rpc.mountd.
##
#rpc stream tcp nowait root /usr/sbin/rpc.rexd 100017 1 rpc.rexd
#rpc dgram udp wait root /usr/lib/netsvc/rstat/rpc.rstatd 100001 2-4 rpc.rstatd
#rpc dgram udp wait root /usr/lib/netsvc/rusers/rpc.rusersd 100002 1-2 rpc.rusersd
#rpc dgram udp wait root /usr/lib/netsvc/rwall/rpc.rwalld 100008 1 rpc.rwalld
#rpc dgram udp wait root /usr/sbin/rpc.rquotad 100011 1 rpc.rquotad
#rpc dgram udp wait root /usr/lib/netsvc/spray/rpc.sprayd 100012 1 rpc.sprayd

##
#
# The standard remshd and rlogind do not include the Kerberized
# code. You must install the InternetSvcSec/INETSVCS-SEC fileset and
# configure Kerberos as described in the SIS(5) man page.
#
##
kshell stream tcp nowait root /usr/lbin/remshd remshd -K
klogin stream tcp nowait root /usr/lbin/rlogind rlogind -K


##
#
# NCPM programs.
# Do not uncomment these unless you are using NCPM.
#
##

#ncpm-pm dgram udp wait root /opt/ncpm/bin/ncpmd ncpmd
#ncpm-hip dgram udp wait root /opt/ncpm/bin/hipd hipd

dtspc stream tcp nowait root /usr/dt/bin/dtspcd /usr/dt/bin/dtspcd
rpc xti tcp swait root /usr/dt/bin/rpc.ttdbserver 100083 1 /usr/dt/bin/rpc.ttdbserver
recserv stream tcp nowait root /usr/lbin/recserv recserv -display :0
rpc dgram udp wait root /usr/dt/bin/rpc.cmsd 100068 2-5 rpc.cmsd
swat stream tcp nowait.400 root /opt/samba/bin/swat swat
registrar stream tcp nowait root /etc/opt/resmon/lbin/registrar /etc/opt/resmon/lbin/registrar
hacl-probe stream tcp nowait root /opt/cmom/lbin/cmomd /opt/cmom/lbin/cmomd -f /var/opt/cmom/cmomd.log
hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p
hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c
instl_boots dgram udp wait root /opt/ignite/lbin/instl_bootd instl_bootd
nanan
Trusted Contributor

Re: Cluster did not form....

make cmclnodelist in /etc/cmcluster
and insert the node name and root

like this

NODE1 root
NODE2 root

and copy to the other node'same location

and then try again
If you still got problem,
inetd.conf file change like bellow


hacl-probe stream tcp nowait root /opt/cmom/lbin/cmomd /opt/cmom/lbin/cmomd -f -i /var/opt/cmom/cmomd.log -r /var/op
t/cmom

and issues inetd -c both of them

and try again

Daniel Sanabria
Advisor

Re: Cluster did not form....

Guys,

Thanks a lot for all the help. The issue was related to security. Communication between SVR2 and SVR1 wasn't two way so I had to review .rhosts in both nodes.

Syslog in SVR2 starting the cluster from SVR1:

Jul 30 19:25:04 SVR2_CL cmcld: Starting cluster management protocols.
Jul 30 19:25:04 SVR2_CL cmcld: Attempting to form a new cluster
Jul 30 19:25:04 SVR2_CL cmtaped[19777]: cmtaped: Kernel tuneable st_ats_enabled
disabled for ATS.
Jul 30 19:25:06 SVR2_CL cmcld: Turning on safety time protection
Jul 30 19:25:06 SVR2_CL cmcld: 2 nodes have formed a new cluster, sequence #1
Jul 30 19:25:06 SVR2_CL cmcld: The new active cluster membership is: SVR1_CL(i
d=1), SVR2_CL(id=2)
Jul 30 19:25:06 SVR2_CL cmlvmd: Clvmd initialized successfully.
Thomas J. Harrold
Trusted Contributor

Re: Cluster did not form....

I'd advise against using /.rhosts for security. Stick with /etc/cmcluster/cmclnodelist, or (better yet) use the permissions that are built in to SG 11.16 or newer.

-tjh
I learn something new everyday. (usually because I break something new everyday)