Operating System - HP-UX
1829863 Members
2186 Online
109993 Solutions
New Discussion

Re: Unable to communicate to node in SG 11.13

 
Itanium2_1
Occasional Contributor

Unable to communicate to node in SG 11.13

Hi Gurus..

I've successfully created a 3-node HP cluster on 3 RP8400 systems, using the same .rhosts, /etc/hosts.equiv and SG 11.13 on HP-UX 11.11

However, when I try to include the 4rth node into the cluster, after modifying the cluster config ascii file and doing a cmcheckconf -v -C rac92.asc, I get the error message
"unable to communicate to node"

I can rlogin successfully to that node from all the other nodes in the cluster, and from it to the other nodes, but it does take about 2- 4 seconds longer to rlogin to that node. I have the same .rhosts, /etc/hosts, /etc/hosts.equiv on each node, and I'm merely trying to add a node to the existing 3-node configuration..

Suggestions / Comments welcome..
4 REPLIES 4
Andrew R.
Advisor

Re: Unable to communicate to node in SG 11.13

Saty,
Check your entries in /etc/inetd.conf
there should be two/three entries starts with hacl-cfg
update it and then do inetd -c

If that doesn't work, check your routing in netconf and check resolv.conf

Good Luck
Live for the infinity life
Stephen Doud
Honored Contributor

Re: Unable to communicate to node in SG 11.13

Hello Saty,

The actual error messages might help us more clearly formulate a response. The messages indicate a lack of communication. The source could be lack of physical connection, misconfigured network adapters, misconfigured name services, or as Andrew stated, misloaded ServiceGuard code resulting in inability to complete the SG commands.

1- verify hardware connectivity
# linkloop -i
where PPA is the digit in PPA colume of the lanscan output. Station address is that of another NIC, either on the same node, or another node connected to the same network.

2- verify name services
On each node, determine if you can remsh not only to all partner nodes from each other, but also the node itself. ALL SG commmands use network ports, hence all need networking permissions.

You may want to consider creating a /etc/cmcluster/cmclnodelist file on each host, which uses the format of .rhosts. Insure EVERY node in the cluster (even the node itself) is listed, and given root priviledges.

2- verify the nodes ability to listen for HA commands:

# netstat -a | grep hacl | grep LISTEN
tcp 0 0 *.hacl-probe *.* LISTEN
tcp 0 0 *.hacl-cfg *.* LISTEN

Insure 9 "hacl" lines are in /etc/services, and

# grep cmclconfd /etc/inetd.conf
hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p
hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c

If they are in there, but the LISTEN isn't happening, as Andrew stated, perform "inetd -c" to have inetd re-read the files and add services.

Good luck.

-s.
SuperDome_1
Advisor

Re: Unable to communicate to node in SG 11.13

Thx Andrew R..

I checked my /etc/inetd.conf, and have the following entries on all 4 nodes..

hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p
hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c
hacl-probe stream tcp nowait root /opt/cmom/lbin/cmomd /opt/cmom/lbin/cmo
md -f /var/opt/cmom/cmomd.log

(The entries above are word-wrapped)

I also checked my /etc/rc.config.d/netconf file

and they are also identical.. as well as the /etc/resolv.conf. and they are identical too..

I think the key to this may be that rlogin takes longer to that node, than all the other nodes in the cluster..

Thx again


SuperDome_1
Advisor

Re: Unable to communicate to node in SG 11.13

Thx..

I did create an /etc/cmcluster/cmclnodelist file on each node in the format

hostname1 root
hostname2 root
hostname3 root
hostname1 oracle_user
hostname2 oracle_user
hostname3 oracle_user

and so on..

I can rmesh from all nodes to the others, but it takes longer to the node which ServiceGuard can't communicate to..

I verified the nodes capabilty to listen for HA commands.. and received the same output on all 4 nodes..

and saw that there are 9 "hacl" entries in the /etc/services files...

and verified that the /etc/inetd.conf files are the same, and re-read them using inetd -c..

For linkloop -i, the man pages don't say whether to use the hardware address or the IP address for the linkaddress argument..

Thx


Sat