Operating System - HP-UX
1837143 Members
2303 Online
110112 Solutions
New Discussion

cmrunnode denies cluster binary despite file's existence on node

 
Ralph Grothe
Honored Contributor

cmrunnode denies cluster binary despite file's existence on node

Hi,

I have a queer problem cmrunning a node to join the cluster after cmapplyconf (from the "central cluster management node") already successfully constructed a cluster binary that includes the new node, and that has successfully been distributed to all nodes (including the new node).
Despite cmrunnode claims absence of cluster binary although I can stat it in the filesystem with the correct checksum when compared to that on the other nodes.

We already applied the very same procedure to another cluster (i.e. joining a new node) where things worked fine.

I cannot see what's different with this node.

# cmrunnode
cmrunnode : Unable to determine the nodes on the current cluster
cmrunnode : Either no cluster configuration file exists, or the file is corrupted, or cmclconfd is u
nable to run

In syslog.log nothing gets logged.

# tail -2 /var/adm/syslog/syslog.log
Nov 30 17:40:39 main CM-CMD[3909]: /usr/sbin/cmrunnode
Nov 30 17:50:05 main CM-CMD[3931]: cmrunnode


When trying to read the cluster binary on this new node it appears to have a permission problem with the loopback device (or a Unix socket)

# cmgetconf
Error: Permission denied to 127.0.0.1

or even better

# cmviewconf
cmviewconf: Binary file does not exist.

# cksum /etc/cmcluster/cmclconfig
1894821746 72136 /etc/cmcluster/cmclconfig

# remsh inn cksum /etc/cmcluster/cmclconfig
1894821746 72136 /etc/cmcluster/cmclconfig


Wheras when retreiving the binary's contents from another node I get an immediate response
(n.b. main (like the German river Main) is the troublesome newly added node)

# cmviewcl -l node

NODE STATUS STATE
inn up running
lech up running
main down halted
oder up running

# cmgetconf|grep main
NODE_NAME main


Any ideas what's going on here?

We double checked the correct NIC settings for primary and standby as far as speed and duplex settings are concerned, as well as linkloop-ed them against those on other nodes.

DNS resolution is according to nsswitch.conf files first and working accross all nodes.
The same is true for trusted host mode,
aka remsh commands.

Regards
Ralph




Madness, thy name is system administration
10 REPLIES 10
Denver Osborn
Honored Contributor

Re: cmrunnode denies cluster binary despite file's existence on node

So you've verified your .rhosts or cmclnodelist and they looked fine... Are there any problems with the /etc/hosts file?

-denver
Steven E. Protter
Exalted Contributor

Re: cmrunnode denies cluster binary despite file's existence on node

Shalom,

Have you recently upgraded to SG 11.16?

Few things to maybe try:
1) See if .rhosts with + makes a difference.
2) Some changes may be necessary in the inted.conf entries for SG.

add the parameter -c to the hacl-cfg stream tcp line

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
melvyn burnard
Honored Contributor

Re: cmrunnode denies cluster binary despite file's existence on node

this is normally due to missing/bad network configuration regarding hostnames and ip addresses.

Take a read through:
http://docs.hp.com/en/6283/SGsecurityfiles.pdf
http://docs.hp.com/en/5874/securingserviceguard_nov2005.pdf


Essentially you need to have ALL networks on an SG node listed in /etc/hosts
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
john korterman
Honored Contributor

Re: cmrunnode denies cluster binary despite file's existence on node

Hi Ralph,

if you have exhausted all other possibilities, then try consulting doc id: 8606390795 in the techbase.

Pay particular attention to the paragraph starting with:
"Once the above patches (or the Serviceguard A.11.16 release) are
installed, Serviceguard commands fail with permission denied for the
loopback address if (building up suspense).."

However, it may not apply to your system, as there are a number of ifs.

regards,
John K.
it would be nice if you always got a second chance
Sameer_Nirmal
Honored Contributor

Re: cmrunnode denies cluster binary despite file's existence on node

Hi,
It seems that the node "main" is totally out of cluster view eventhough it appeared in the cluster configuration.

I guess the error "Permission denied to 127.0.0.1" is mearly becuase of the hostname resolution problem.
The cmgetconf or rather cmclconfd is taking it as localhost/127.0.0.1 as node specific data to gather the cluster configuration.
Now since localhost/127.0.0.1 entry won't be in the common cluster configration file , it would give the permission denied error.

I am not sure which version of SG you are using. But there is a patch available for name resolution from /etc/hosts file.
Refer this doc for SG A.11.16.00 HP-UX 11.11
This patch also has mention of "Remote cluster configuration is not allowed in
11.16.00".

http://www2.itrc.hp.com/service/cki/patchDocDisplay.do?patchId=PHSS_32733

You need to ensure .rhosts ,/etc/hosts and cmclnodelist files, patches are compliant.
Ralph Grothe
Honored Contributor

Re: cmrunnode denies cluster binary despite file's existence on node

Hi Folks,

many thanks for your replies.

I haven't yet found the time to consult the docs some of you pointed me to by various URLs.

A priori following these tracks I again rechecked my node name resolution and network settings, and cannot see anything wrong with them.
I also rechecked that root can run r* commands between all four nodes without being prompted for a password.

Here as short evidence the NS querying from the unwilling to join node "main".
(n.b. I stripped off the IP addresses and FQDNs for exaggerated paranoia reasons in the presented outputs)

From a node where I can run cmviewcl here's the list of cluster nodes


# cmviewcl -l node

NODE STATUS STATE
inn up running
lech up running
main down halted
oder up running


And these were issued on "main"


# echo inn lech main oder | xargs -n1 nsquery hosts |egrep -v Address\|Hostname

Using "files [NOTFOUND=continue] dns" for the hosts policy.

Searching /etc/hosts for inn
Aliases: inn
Switch configuration: Terminates Search

Using "files [NOTFOUND=continue] dns" for the hosts policy.

Searching /etc/hosts for lech
Aliases: lech
Switch configuration: Terminates Search

Using "files [NOTFOUND=continue] dns" for the hosts policy.

Searching /etc/hosts for main
Aliases: main
Switch configuration: Terminates Search

Using "files [NOTFOUND=continue] dns" for the hosts policy.

Searching /etc/hosts for oder
Aliases: oder
Switch configuration: Terminates Search


Madness, thy name is system administration
Ralph Grothe
Honored Contributor

Re: cmrunnode denies cluster binary despite file's existence on node

Forgot to add,
because quite a few of you were mentioning the SG 11.16 release.
This is in fact a downgraded MC-OE SG from 11.16 to 11.14 because customer wished to merge HP-UX 11.00 with 11.11v1 cluster nodes.
You may have noticed my other related threads?
Actually, for this partial downgrade to work the enforce_scripts=false kludge had to be used.
But we have evidence that this cannot be the cause of the current troubles since the applied procedure worked perfectly well for the database cluster (this one being the application server cluster, which unfortunately has higher time schedule precedence).
Madness, thy name is system administration
Ralph Grothe
Honored Contributor

Re: cmrunnode denies cluster binary despite file's existence on node

Steven,

I already noticed from the 11.14 patche's Readme that some tinkering with inetd.conf (i.e. disabling of identd) was required.
This seems to be the relevant passage you are alluding to


# swlist -l fileset -a readme PHSS_32260|sed -n 4650,4660p
The cmclconfd line should appear as:

hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd \
cmclconfd -c -i

The cmomd line should appear as:

hacl-probe stream tcp nowait root \
/opt/cmom/lbin/cmomd /opt/cmom/lbin/cmomd -i -f \
/var/opt/cmom/cmomd.log


That's what I did prior to sending inetd SIGHUP (or rather /usr/sbin/inetd -c)

# grep ^hacl /etc/inetd.conf
hacl-probe stream tcp nowait root /opt/cmom/lbin/cmomd /opt/cmom/lbin/cmomd -i -f /var/opt
/cmom/cmomd.log
hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p
hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -i -c

Madness, thy name is system administration
Steven E. Protter
Exalted Contributor

Re: cmrunnode denies cluster binary despite file's existence on node

Shalom again Ralph,

I'd suggest untinkering the inetd.conf file and trying again.

No luck with my .rhosts suggestion?

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ralph Grothe
Honored Contributor

Re: cmrunnode denies cluster binary despite file's existence on node

No Steven,
the simplified .rhosts with "+" to make any remote host a trusted one didn't change a thing.
Meanwhile the person "socially" in charge of the clusters (viz. he who keeps customer contacts) has been running short of time for me doing extended "research" on the living clusters, and asked me to again release the two new 11.11v1 nodes from the two 11.00 clusters.
This was easily done after having halted the new node from the cluster (DB) that passed the extension (the other cluster's (Appl.) new node "main" wasn't up anyway), and issueing a new cmapplyconf with the 3node config files.
That left me with two unproductive nodes and thus a much more agreeable playground for further experiments releived from the fear of buggering something crucial up.

When I now try to join these two lose nodes to an entirely new cluster I run into the same sort of crap with node "main".
There must be something odd with its setup that separates it from the other new node "mosel".
Maybe something as weird as some other sort of (PAM induced?) user authentication/identification.

I will take up your proposal and further advance with fumbling with different hacl* services identd settings.

To be continued...
Madness, thy name is system administration