1836903 Members
2202 Online
110111 Solutions
New Discussion

Cluster startup

 
RikTytgat
Honored Contributor

Cluster startup

Hi,



I'm installing a new MC/ServiceGuard cluster on HPUX11i using MC/SG A.11.13.



cmapplyconf and cmcheckconf do not prodcue an error, but when starting the cluster, following output is in syslog.log:



-----------------------

Nov 21 17:30:05 s854139 CM-CMD[7794]: cmruncl -v

Nov 21 17:30:05 s854139 cmclconfd[7797]: Executing "/usr/lbin/cmcld" for node s854139.mecpark4.kb.be

Nov 21 17:30:10 s854139 cmcld: Daemon Initialization - Maximum number of packages supported for this incarnation is 5.

Nov 21 17:30:10 s854139 cmcld: Global Cluster Information:

Nov 21 17:30:10 s854139 cmcld: Heartbeat Interval is 1 seconds.

Nov 21 17:30:10 s854139 cmcld: Node Timeout is 2 seconds.

Nov 21 17:30:10 s854139 cmcld: Network Polling Interval is 2 seconds.

Nov 21 17:30:10 s854139 cmcld: Auto Start Timeout is 600 seconds.

Nov 21 17:30:10 s854139 cmcld: Information Specific to node s854139:

Nov 21 17:30:10 s854139 cmcld: Cluster lock disk: /dev/dsk/c5t1d1.

Nov 21 17:30:10 s854139 cmcld: lan0 0x00306e09316d 10.251.14.126 bridged net:1

Nov 21 17:30:10 s854139 cmcld: lan1 0x00306e1b9d4b 192.168.101.101 bridged net:2

Nov 21 17:30:10 s854139 cmcld: lan2 0x00306e1b9d4a 192.168.102.101 bridged net:3

Nov 21 17:30:10 s854139 cmcld: Heartbeat Subnet: 192.168.101.0

Nov 21 17:30:10 s854139 cmcld: Heartbeat Subnet: 192.168.102.0

Nov 21 17:30:10 s854139 cmcld: The maximum # of concurrent local connections to the daemon that will be supported is 33.

Nov 21 17:30:11 s854139 cmcld: Lookup-node-by-name failed.

Nov 21 17:30:11 s854139 cmsrvassistd[7802]: The cluster daemon aborted our connection.

Nov 21 17:30:11 s854139 cmsrvassistd[7802]: Lost connection with ServiceGuard cluster daemon (cmcld): Software caused connection abort

Nov 21 17:30:11 s854139 cmclconfd[7797]: The ServiceGuard daemon, /usr/lbin/cmcld[7798], exited with a status of 1.

Nov 21 17:30:11 s854139 cmlogd: Unable to communicate with ServiceGuard cluster daemon (cmcld): Connection refused

-----------------



Especially the Lookup-node-by-name troubles me. All DNS and /etc/hosts and /etc/cmcluster/cmclnodelist and nsswitch.conf config seems to be correct.



Any ideas?



Thanks in advance,

Rik.
10 REPLIES 10
Justo Exposito
Esteemed Contributor

Re: Cluster startup

Hi Ryk,

Do you have the .rhosts defined for the root user to the other box?

Regards,

Justo.
Help is a Beatiful word
Justo Exposito
Esteemed Contributor

Re: Cluster startup

Another idea, do you use the cmquerycl command to check your configuration?

Regards,

Justo.
Help is a Beatiful word
RikTytgat
Honored Contributor

Re: Cluster startup

Yes,



The .rhost file on both server contains just about every imaginable hostname and ip address.



I did use cmqueryconf, but only to check the configuration. I didn't change and reapply the result of cmqueryconf.



Apologies for the 3 postings, but my browser seemed to hang ...





Bye,

Rik
Sridhar Bhaskarla
Honored Contributor

Re: Cluster startup

Rik,

Not sure but I am guessing that your cmcld is getting confused with the fully qualified domain name. Did you specify fully qualified name in your clusterconfig file?. If so, try with only the hostname and see if it works?.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
RikTytgat
Honored Contributor

Re: Cluster startup

Shridar,



Yep, I tried that too. Initially, my config was with hostnames only. When trying fully qualified names, the cmcheckconf command failed.



Thanks,

Rik
melvyn burnard
Honored Contributor

Re: Cluster startup

The output in syslog from the line:
Nov 21 17:30:10 s854139 cmcld: Global Cluster Information
to
Nov 21 17:30:10 s854139 cmcld: Heartbeat Subnet: 192.168.102.0

is new logging supplied as of 11.13 of ServiceGuard, and is really nice information to have.
For example it tells me you are using default heartbeat and node timeouts of 1 and 2 seconds respectively. I would recommend you change these t 2 and 8 seconds.

But to get to your actual problem, there does seem to be some form of hostname lookup problem here.
You need to start with basics, and not use DNS. Have everything set up in /etc/hosts, and then use nslookup host to verify IP addres correctly, and then do the reverse, i.e. nslookup IP _address and make sure you get the hostname back.
Do this for ALL ip addresses you are using in the cluster.
Oh yes, also ensure you have patched SG with PHSS_24678!

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Sridhar Bhaskarla
Honored Contributor

Re: Cluster startup

I got that doubt because all of my servers are giving the output like this

Oct 21 02:40:30 my_host cmclconfd[12974]: Executing "/usr/lbin/cmcld" for node
my_host

Not my_host.mydomain.com

and my resolve policy is hosts and then DNS.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Sanjay_6
Honored Contributor

Re: Cluster startup

hi Rik,

Check the hostname and ip resolution for your cluster nodes.

Do "nslookup"

nslookup
> cluster_node_1_name
> cluster_node_2_name
> cluster_node_1_ip
> cluster_node_2_ip
> exit

Next edit /etc/cmcluster/cmclnodelist file and have this entry in that file

cluster_node_1_name root
cluster_node_2_name root

Do this on both the servers and then try to restart the cluster.

Hope this helps.

Regds


Satish Y
Trusted Contributor

Re: Cluster startup

Hi Rik,

Check whether,

1) /.rhosts contains all entries along with FQDN, INCLUDING the host on which u r starting cluster.

2) There is one more file u need to have is something like cmclnodelistin /etc/cmcluster directory, which is similar to /.rhosts

U might have missed one of the above things....

Hope it will solve ur problem...

Cheers...
Satish.
Difference between good and the best is only a little effort
RikTytgat
Honored Contributor

Re: Cluster startup

Problem solved!!

The cause seems to have been the hostname.

It was a FQ name, and changing it to the hostname part only solved the problem.

Thanks for your help,
Rik.