Operating System - HP-UX
1836617 Members
2112 Online
110102 Solutions
New Discussion

Could not read messages from /usr/lbin/cmcld: Software caused connection abort

 
SOLVED
Go to solution
CA942032
Occasional Advisor

Could not read messages from /usr/lbin/cmcld: Software caused connection abort

O/S version - 11.11
S/G version - 11.15

I'm trying to build my initial cluster config and am getting the above error in the syslog. I'm also getting this when attempting to run cmruncl:

# cmruncl -v
Error: Permission denied to 127.0.0.1
Warning: Local node is not currently configured in a cluster
cmruncl : Unable to determine the nodes on the current cluster
cmruncl : Either no cluster configuration file exists, or the file is corrupten

This is after the cmcheckconf and cmapplyconf have successfully completed. Any help is appreciated. Thanks.
10 REPLIES 10
Dietmar Konermann
Honored Contributor

Re: Could not read messages from /usr/lbin/cmcld: Software caused connection abort

Hi Doug,

looks like some node authorization problem. If you use .rhost then you need to add all nodes' root users to all nodes' ~root/.rhosts files. The same applies to /etc/cmcluster/cmclnodelist if you use this method.

See the Serviceguard manual for details.

Best regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
CA942032
Occasional Advisor

Re: Could not read messages from /usr/lbin/cmcld: Software caused connection abort

Thanks for the quick followup.

This is what my .rhosts file looks like on both nodes, unless I need to add something else.

# cat /.rhosts
cmaxx2 root
cmaxx1 root

cmaxx1 and cmaxx2 being the hostnames of the machines.
Jeff Schussele
Honored Contributor

Re: Could not read messages from /usr/lbin/cmcld: Software caused connection abort

Hi,

I always use the /etc/cmcluster/cmclnodelist file & make sure that the shortnames resolve correctly on all nodes.
Sounds like somehow or another the cluster binary thinks either localhost or 127.0.0.1 belongs in the cluster.
I'd rerun the check & apply scripts making sure that localhost is *not* refernced in the cluster ascii file.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Sridhar Bhaskarla
Honored Contributor

Re: Could not read messages from /usr/lbin/cmcld: Software caused connection abort

Hi,

Make sure you can 'rlogin to itself' and see if it works.

Interestingly "Local node is not currently configured in a cluster" doesn't necessarily relate to .rhosts|cmclnodelist problem. How are you building the configuration?. Did you already generate the cluster ascii file and apply the configuration? Did you get any errors when you ran

#cmquerycl -C /etc/cmcluster/cmclconfig.ascii -n node1 -n node2

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
CA942032
Occasional Advisor

Re: Could not read messages from /usr/lbin/cmcld: Software caused connection abort

I'm building the ascii file identical to your example, and also my notes, with the exception of course that I'm specifying the node names. I didn't receive any errors during the check or apply, but to be sure that I didn't, I've tried to rebuild this again with the same results.

The only thing I can think of is that the lock disk is on an EMC array, and not a HP branded disk array. I don't see why this would cause any issues, because all other functionality to that disk is fine.

I've verified that all node names can be resolved, and that I can rlogin to myself on both nodes.

Any and all help is greatly appreciated. Thanks.
Sridhar Bhaskarla
Honored Contributor

Re: Could not read messages from /usr/lbin/cmcld: Software caused connection abort

Well - in that case make sure you have entries in /etc/inetd.conf for hacl-cfg (two one with udp and the other with tcp). Also make sure you are not disallowing the local host in /var/adm/inetd.sec. If you see any hacl-cfg entry in that file, comment it out temporarily and see if it works. You will need to run 'inetd -c' after modifying inetd.conf and inetd.sec files.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
CA942032
Occasional Advisor

Re: Could not read messages from /usr/lbin/cmcld: Software caused connection abort

My /etc/inetd.conf was set correctly:
hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p
hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c

However I do not have a /var/adm/inetd.sec. Which means to me that all ports are wide open.
Sridhar Bhaskarla
Honored Contributor
Solution

Re: Could not read messages from /usr/lbin/cmcld: Software caused connection abort

Since you already compiled the configuration, I would suggest you run 'cmscancl -v -o /tmp/cluster.out' and attach the output.

-Sri

PS: Please do not assign points until your problem is fixed. Particularly a 7 indicates that your problem is almost solved.
You may be disappointed if you fail, but you are doomed if you don't try
CA942032
Occasional Advisor

Re: Could not read messages from /usr/lbin/cmcld: Software caused connection abort

Very interesting results here. In trying to run the cmscancl command, it would consistently fail on node 1, but succeed on node 2. Overall the command would fail because it couldn't stat node 1. I then compared the hostname on both nodes, and the second node was only using the node name. As it turns out, an admin here had set the hostname on my primary node to a fully qualified domain name. Even though I did not have a /etc/resolv.conf in place, and /etc/nsswitch.conf was set to use FILES only, and /etc/hosts had alias definitions for all nodes, apparently SG wants to use DNS no matter what. Once I changed the hostname back to just the node name, cmscancl completed on all nodes, and now the cluster runs.

Thank you Sri for your patience and for leading me down the right trail to get this resolved.
Stephen Doud
Honored Contributor

Re: Could not read messages from /usr/lbin/cmcld: Software caused connection abort

Too late for me to add value other than to refer you to this document should you see other "permission denied" problems:

DOCUMENT ID: UMCSGKBRC00008185
TITLE: Cluster Configuration Commands Fail with "permission denied"

In there, this:

CAUSE 7: Hostname resolution services (whether local /etc/hosts or
DNS) may be supplying a mix of fully qualified domain names (FQDN)
with simple hostnames.

-StephenD.