Operating System - HP-UX
1830327 Members
2187 Online
110001 Solutions
New Discussion

Re: host fails to join cluster after reboot

 
Richard Woolley
Frequent Advisor

host fails to join cluster after reboot

Could anyone point me in the right direction, after a K580 was rebooted on sunday, a cmviewcl (ran from the node itself) reported that the node was not running in the cluster, the cmcld daemon was running.

I have attached syslog note the "permission denied for root user" and the filename it shows (this file lists the node and root in it). This node is called "saturn"
13 REPLIES 13
Justo Exposito
Esteemed Contributor

Re: host fails to join cluster after reboot

Hi Mark,

It is ok the .rhosts files in all the nodes?

Regards,

Justo.
Help is a Beatiful word
Steve Steel
Honored Contributor

Re: host fails to join cluster after reboot

Hi


Check

1)That saturn is in
/etc/cmcluster/cmclnodelist

2)The reason why some home directories are not found.

3)Question

Is there another cluster on this subnet.


Steve Steel
If you want truly to understand something, try to change it. (Kurt Lewin)
Richard Woolley
Frequent Advisor

Re: host fails to join cluster after reboot

the .rhosts files seem ok on all the nodes as does the cmclnodelist file.

not sure what you mean about the home dirs (if its /home then yeah they seem ok).

This is a 5 node cluster.

any other ideas?

cheers,
mark.
John Palmer
Honored Contributor

Re: host fails to join cluster after reboot

Does cmclnodelist have an entry for all five hosts and is it the same on all of them?

Regards,
John
melvyn burnard
Honored Contributor

Re: host fails to join cluster after reboot

What version of SG have you installed, and what patch for SG do you have?
Also, have you run cmscancl to check things out that way?
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Steve Lewis
Honored Contributor

Re: host fails to join cluster after reboot

Its been a few months since I have MCSG'd, but the following points come to light:

(1) You will need user root in the .rhosts and cmnodelist files for the LOCAL server as well as the remote servers.

(2) Did you attempt to rebuild the cluster? Your syslog file shows several
vgchange -c n
followed by
vgchange -a y
which is potentially disasterous if your volume group can be activated on another node at the same time. i/o diags related? Why do you want to remove cluster-awareness of VGs when activating your package? Normally you keep vgchange -c y (permanent) and activate the VG using vgchange -a e (exclusive access to the node) or vgchange -a S for mc/lockmanager.

(3) Have you implemented NIS, DNS or other potential hostname-losing network tools?

(4) Check the network interface IP/s vs. hostnames in hosts, DNS or wherever it is held.



Steve Steel
Honored Contributor

Re: host fails to join cluster after reboot

Hi

As stated

For the ServiceGuard commands to work properly each host in the
node must have its own name as well as the other nodes in its
own .rhosts file.

You need to add this and try cmquerycl again. Then run cmruncl,
and it should work fine this time.

Maybe your name resolution is bad.
Check .rhosts and name resolution for saturn.


steve Steel

If you want truly to understand something, try to change it. (Kurt Lewin)
Steve Steel
Honored Contributor

Re: host fails to join cluster after reboot

Hi

Check this as well

fully qualified hostnames in .rhosts
If found then reduce to simple hostname

Steve steel
If you want truly to understand something, try to change it. (Kurt Lewin)
Richard Woolley
Frequent Advisor

Re: host fails to join cluster after reboot

Will check that all the files are correct etc... heres some answers:
1. yes cmclnodelist has entries for all 5 servers.
It looks like:
saturn root
saturn_h root
saturn_100bt root
Entries like these above for the other 4 nodes
2. version A.11.13 with patch PHSS_25915
3. The cluster was not re-built
4. No network tools have been implemented ie. DNS/NIS etc...

Will double check the .rhosts files are all the same (i will attach a copy)
jd-gt
Occasional Advisor

Re: host fails to join cluster after reboot

When you say the host fails to join cluster after reboot, I want to know, do you mean that the rc files are not starting the cluster or do you mean you manually issue a cmrunnode command and it fails?

This configuration file must be set if you want the cluster to automatically start after the reboot.

/etc/rc.config.d/cmcluster

#*************************** CMCLUSTER *************************

# Highly Available Cluster configuration
#
# @(#) $Revision: 81.2 $
#
# AUTOSTART_CMCLD: If set to 1, the node will attempt to
# join its CM cluster automatically when
# the system boots.
# If set to 0, the node will not attempt
# to join its CM cluster.
#
AUTOSTART_CMCLD=0

If this is not the problem I would ask for some sample info from the /etc/hosts file.

Good luck!
jad
Stephen Doud
Honored Contributor

Re: host fails to join cluster after reboot

Hi Mark,

Well, many others have taken a shot at it, so I might as well.

The initial trouble messages in syslog.log are:
May 5 10:30:11 saturn CM-CMD[5797]: /usr/sbin/cmrunnode -v
May 5 10:30:14 saturn cmclconfd[5820]: Permission denied for user root on node saturn (/etc/cmcluster/cmclnodelist)

This suggests a permissions issue allowing root to access the local system via networking services.

ServiceGuard utilizes "hacl" network services listed in /etc/services (9 lines)and /etc/inetd.conf (3 lines for 11.12 and later) when performing ANY command.

If network services render "permission denied", first check for the existence of the primary permission file that SG seeks - the /etc/cmcluster/cmclnodelist file. Make certain it is on all servers and that each permits root access to EVERY node including itself.

If that file is not used, inspect ~/.rhosts to insure it allows root priviledges to ALL nodes (including self).

If this is not the problem, begin to suspect hostname services (/etc/hosts, /etc/nsswitch.conf, /etc/resolv.conf, or even more fundamentally, a configuration problem with the hostname (fully qualified domain names vs. simple hostnames (preferred).

This issue has many generation points, so call the Response Center if you can't get to the bottom of it.

-s.

Sukant Naik
Trusted Contributor

Re: host fails to join cluster after reboot

Hi Mark,

I dont know how much will this help you.

But in your control.sh script you are specifying exit 0 for the package startup script. This is a problem which I also had faced earlier.

I used to invoke a shell script in the control.sh like

. ./etc/cmcluster/pkg1/pkg_script.sh

In the pkg_script.sh file, I used to return with exit 0. As this script is executed in the same shell, it used to fail.

And, it depends on the failover policy too. Have you configured that the cluster must halt the node if the package fails on the node saturn. This is what the log file indicates

May 8 01:45:41 saturn cmcld: Service PKG*62990 terminated due to an exit(0).
May 8 01:45:41 saturn cmcld: Halted package bcv on node saturn.

Just my two cents.
-Sukant
Who dares he wins
Richard Woolley
Frequent Advisor

Re: host fails to join cluster after reboot

Hi all, sorry for the delay.

jad - How do i check if the rc files are not starting the serviceguard daemons? I presumed this would be in syslog?
I also check that the "cmcld" process was running and it was.
I had to manually execute the "cmrunnode" and the node joined the cluster 1st time with no problems.

AUTOSTART_CMCLD=1 !

stephen - i presume root can access the local system via networking because my swlist commands etc... work and I am sure these processes access the local system via the networking!

The cmclnodelist is the same on all 5 nodes and does include root access for each machine. Also the /.rhosts is the same. The /etc/hosts file seems fine (attached)
nsswitch.conf is all set to "files"
there is no /etc/resolv.conf file

Sukant - No packages were attempting to run. We start them manually plus no scripts are executed within the same shell! The message you were viewing for the bcv package was part of our backups and Im sure hasnt impacted on the node joining the cluster.

cheers, :-)

mark.