1828983 Members
2265 Online
109986 Solutions
New Discussion

Cluster not starting

 
SOLVED
Go to solution
Mike Duffy_1
Honored Contributor

Cluster not starting

Good morning,

I come in this morning and out single machine SG cluster node is not running. A reboot has not solved it.
cmviewcl -v shows
cmviewcl : Cannot read the cluster configuration file. Either it does not exist, or it is corrupted, or it is empty

All the corect files are in /etc/cmcluster as follows

-rw-r----- 1 root sys 1341440 Dec 5 2000 cm.tar
-rw-rw-r-- 1 root sys 15 Dec 18 2000 cmclnodelist.old
-r--r--r-- 1 root sys 2808 Dec 12 2002 cmclconf.ascii.old
-rw------- 1 root root 13760 Mar 31 12:55 cmclconfig.old
-rw------- 1 root root 0 Mar 31 12:55 cmclconfig.tmp
-rw-r----- 1 root sys 15 May 10 08:54 cmclnodelist
-r--r----- 1 root sys 2808 May 10 08:54 cmclconf.ascii
-rw------- 1 root sys 13760 May 10 08:54 cmclconfig



Anyone got any ideas?

Thanks in advance.

8 REPLIES 8
G. Vrijhoeven
Honored Contributor

Re: Cluster not starting

Hi,

Do you have a current ascii file? If so, you can try a cmcheckconf -v -C /etc/cmcluster/*.ascii
Do you have it on an other node?
Can you do a cmviewcl on the other node? If so start the cluster and do a cmgetconf to generate it and distibute it.


HTH,

Gideon
G. Vrijhoeven
Honored Contributor

Re: Cluster not starting

Hi,

Can you give more error messages (syslog)
Did you change roots .rhost?

Gideon
Johan Lorimier
Frequent Advisor

Re: Cluster not starting

Hi,

Is your cluster having only a single nod ?
If so try to apply the conf file to regenerate the cmclconfig file:
cmcheckconf -v -C cmclconf.ascii
cmaplyconf -C cmclconf.ascii

Do not do that if you have seveval nodes, just retrieve the file from a valid node and then stop/start the broken node
cmhalnode -v nodemane
cmstartnode -v nodename

Johan
Mike Duffy_1
Honored Contributor

Re: Cluster not starting

Hi,

Rhosts not changed.

I have trued the check and got the following.

Checking cluster file: /etc/cmcluster/cmclconf.ascii
Checking nodes ... Done
Error: Unable to connect to the configuration daemon (cmclconfd) on node snmdev1: Connection refused
Checking existing configuration ...
Done
Error: Unable to connect to the configuration daemon (cmclconfd) on node snmdev1: Connection refused
Warning: Can not find configuration for cluster snmdevclust

Error: Unable to connect to the configuration daemon (cmclconfd) on node snmdev1: Connection refused
Error: Unable to establish communication to node snmdev1
cmcheckconf : Unable to reconcile configuration file /etc/cmcluster/cmclconf.ascii



Syslog shows;
ay 10 09:34:16 snmdev1 CM-CMD[29175]: /usr/sbin/cmrunnode -v
May 10 09:34:21 snmdev1 CM-CMD[29374]: /usr/sbin/cmrunnode -v
May 10 09:34:26 snmdev1 CM-CMD[29391]: /usr/sbin/cmrunnode -v
May 10 09:34:32 snmdev1 CM-CMD[29397]: /usr/sbin/cmrunnode -v
May 10 09:34:37 snmdev1 CM-CMD[29403]: /usr/sbin/cmrunnode -v
May 10 09:34:42 snmdev1 CM-CMD[29409]: /usr/sbin/cmrunnode -v
May 10 09:34:47 snmdev1 CM-CMD[29415]: /usr/sbin/cmrunnode -v
May 10 09:34:52 snmdev1 CM-CMD[29422]: /usr/sbin/cmrunnode -v
May 10 09:34:57 snmdev1 CM-CMD[29440]: /usr/sbin/cmrunnode -v
May 10 09:35:02 snmdev1 CM-CMD[29448]: /usr/sbin/cmrunnode -v
May 10 09:35:07 snmdev1 CM-CMD[29481]: /usr/sbin/cmrunnode -v
May 10 09:35:12 snmdev1 CM-CMD[29487]: /usr/sbin/cmrunnode -v
May 10 09:35:17 snmdev1 CM-CMD[29493]: /usr/sbin/cmrunnode -v
May 10 09:35:22 snmdev1 CM-CMD[29803]: /usr/sbin/cmrunnode -v
May 10 09:35:27 snmdev1 CM-CMD[29822]: /usr/sbin/cmrunnode -v
May 10 09:35:33 snmdev1 CM-CMD[29828]: /usr/sbin/cmrunnode -v
May 10 09:35:38 snmdev1 CM-CMD[29834]: /usr/sbin/cmrunnode -v
May 10 09:35:43 snmdev1 CM-CMD[29840]: /usr/sbin/cmrunnode -v
May 10 09:35:48 snmdev1 CM-CMD[29846]: /usr/sbin/cmrunnode -v

G. Vrijhoeven
Honored Contributor

Re: Cluster not starting

Hi,

The node seems has problems talking to itself.
Can you post cmcluster starup script logging from /etc/rc.log.
Can you add version numbers ( swlist | grep -i mc)and patch bundel installed?

Did you add snmdev1 root in roots .rhost file?

Regards,

Gideon
Mike Duffy_1
Honored Contributor

Re: Cluster not starting

Hi,

Output from "/sbin/rc3.d/S996ipfwdoff start":
----------------------------

**************************************************
HP-UX run-level transition completed
Mon May 10 08:25:40 BST 2004
**************************************************
ERROR: Ran out of time while attempting to join the cluster
cmrunnode : Unable to determine the nodes on the current cluster
cmrunnode : Either no cluster configuration file exists, or the file is corrupted, or /usr/lbin/cmclconfd is unable to run
Unable to connect to the configuration daemon (cmclconfd) on node snmdev1: Connection refused
Local node is not currently configured in a cluster
ERROR: Unable to join cluster



It is a single node SG cluster config
No changes made to .rhosts for sometime and snmdev1 is still in there.

Stephen Doud
Honored Contributor
Solution

Re: Cluster not starting

Hi Mike,

Per your opening statement, this is a 1-node cluster...

It may be possible that inetd is not serving up 'cmclconfd' - the SG daemon responsible for processing SG commands, which must also work with the 'hacl' networking ports.
1st) insure the hacl ports are awake:
# netstat -a | grep hacl
You should see at least 3 lines
If they don't exist, insure that /etc/services has 9 lines referencing 'hacl' and /etc/inetd.conf has at least 2. If this is so, re-read /etc/inetd.conf:
# inetd -c
... then repeat the netstat command.
Once the hacl ports are awake, try the cmruncl command again.

2) If the hacl ports are awake, but SG still cannot read the cluster configuration file, two possibilities come to mind:
a. Hostname resolution service isn't working
as you think it is. Use nslookup to verify the name can be resolved as you expect it to be.

b. suggest that you consider re-creating cmclnodelist to containing the simple hostname and root. eg:
root

-sd-
Mike Duffy_1
Honored Contributor

Re: Cluster not starting

thanks all,

this was sort by resotring /etc/inetd.conf as somehow it had been emptied!

All ok now.