Serviceguard
cancel
Showing results for 
Search instead for 
Did you mean: 

Node xxxx is refusing Serviceguard communication

Erick Arturo Perez
Frequent Advisor

Node xxxx is refusing Serviceguard communication

Hi all,
another refusing error. here are my details.

RHEL 5
SG A.11.18.01-0.rhel5
xinetd intalled
identd installed

cmquerycl -v -L /dev/dm-0 -n sgj-bd1 -n sgj-bd2 -C mysqlcl.conf
Looking for other clusters... done
Node sgj-bd2 is refusing Serviceguard communication
Please make sure....(etc etc)

my /etc/hosts (on both nodes)
127.0.0.1 localhost.localdomain localhost
192.168.248.5 sgj-bd1.dom1.com sgj-bd1
192.168.248.7 sgj-bd2.dom1.com sgj-bd2
10.10.10.10 sgj-bd1.hbone sgj-bd1
10.10.10.11 sgj-bd2.hbone sgj-bd2


Telnet to localhost 5302 is ok on both sides.
no firewall of any kind enabled.

/var/log/messages (on sgj-bd1)
xinetd[4206]: START: hcl-cfgupd pid=4211 from:127.0.0.1
xinetd[4206]: EXIT: hacl-cfgupd status=0 pid=4211 duration=15(sec)

/var/log/messages (on sgj-bd2)
xinetd[4267]: START: hacl-cfgupd pid=4271 from=192.168.248.5
xinetd[4267]: EXIT: hacl-cfgupd status=0 pid=4271 duration=15(sec)

So it seems the first node is reaching the second, but is refused.
I have tried restarting the servers, xinetd,identd and still no clue.


I am installing two SG clusters and I had a similar error in the apache cluster. It was fixed by simply restarting identd on one node. However on this cluster i am tired of rebooting/restaring.

Any additional comments on this one?

Thanks in advance,

Erick Perez
Panama.
13 REPLIES
Steven E. Protter
Exalted Contributor

Re: Node xxxx is refusing Serviceguard communication

Shalom,

Not to be a pain but its RTFM time. This is caused by security not being set up correctly on the problem node. Probably not a firewall, maybe a cmnodelist or hostname/dns resolution issue.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Erick Arturo Perez
Frequent Advisor

Re: Node xxxx is refusing Serviceguard communication

Hi Steven,
both cmclnodelist are exactly the same on both nodes.

no firewall was or is enabled on the nodes.

ping on each interface is ok.

what other security should I check?
John Bigg
Esteemed Contributor

Re: Node xxxx is refusing Serviceguard communication

Looks like your config is fine since you included host aliases for all IP addresses which is the thing which is usually missing.

Therefore I suggest you install the latest 11.18 patch SGLX_00222 (or SGLX_00223, SGLX_00224 depending on your architecture) since this fixes a defect where aliases were not recognised:

32. Defect: QXCR1000747462
If the primary names the IP addresses on a cluster node
resolve to do not match the hostname of the node then
Serviceguard commands fail. i.e. it is not possible to
configure the hostname as an alias as described in
the "Configuring IP Address Resolution" section of the
manual.

That is certainly the next step I'd try.
Erick Arturo Perez
Frequent Advisor

Re: Node xxxx is refusing Serviceguard communication

Will you be so kind to direct me to serviceguard patches selection?
I cannot seem to find it.
Erick Arturo Perez
Frequent Advisor

Re: Node xxxx is refusing Serviceguard communication

I was able to do a correct cluster configuration after doing a /etc/hosts change. I have no explanation for this behavior. If someone can explain, please do so.
******before******
my /etc/hosts (on both nodes)
127.0.0.1 localhost.localdomain localhost
192.168.248.5 sgj-bd1.dom1.com sgj-bd1
192.168.248.7 sgj-bd2.dom1.com sgj-bd2
10.10.10.10 sgj-bd1.hbone sgj-bd1
10.10.10.11 sgj-bd2.hbone sgj-bd2

******AFTER and working******
my /etc/hosts (on both nodes)
127.0.0.1 localhost.localdomain localhost
10.10.10.10 sgj-bd1.hbone sgj-bd1
10.10.10.11 sgj-bd2.hbone sgj-bd2
192.168.248.5 sgj-bd1.dom1.com sgj-bd1
192.168.248.7 sgj-bd2.dom1.com sgj-bd2

Please note that the only thing I did was to move the 192. hosts from the top of the file to the bottom. First I was thinking it was a DNS issue, but the DNS servers resolve perfectly.

Also, it was mentioned there are patches to SG. Where? I cannot find a downloadable area for such patches (linux).

Thanks in advance for your comments.

Erick.


emha_1
Valued Contributor

Re: Node xxxx is refusing Serviceguard communication

the problem is us use the same short names for 2 different IPs. system can resolve only to one of these IPs


emha.
John Bigg
Esteemed Contributor

Re: Node xxxx is refusing Serviceguard communication

Please ignore the comment by emha.

You can find the patches by clicking on the patch database link at http://www11.itrc.hp.com/service/patch/mainPage.do from the ITRC home page and entering the patch names in search field at the top which is entitled "find a specific patch". You should then find the patches.

The link for SGLX_00222 is http://www12.itrc.hp.com/service/patch/patchDetail.do?admit=109447627+1204538815436+28353475&patchid=SGLX_00222&sel=%7Blinux%3Aredhat%3A5ap%2C%7D&BC=main%7Csearch%7C (assuming this works from your account).

I think the most likely reason for your changing allowing things to work is that timeings are affected although without seeing your exact configuration it is hard to say. It is certainly unusual to see a system be a member of multiple domains like this.
skt_skt
Honored Contributor

Re: Node xxxx is refusing Serviceguard communication

check if the iptables is running..this could create a kind of a firwall rule.. I had observed that when this is up in my RHEL AS 3 servers that prevent the self ftp the server.
is the RHEL5 update 1?
Erick Arturo Perez
Frequent Advisor

Re: Node xxxx is refusing Serviceguard communication

Kumar: no iptables in place.

John: to help the forums, what kind of complete information do i need to write here so it can be of help to others?

So far, what I can tell is this:
FIRST HOST
hostname: sgj-bd1
OS: RHEL 5 (stock, not updated yet)
SG A.11.18 RHEL 5
xinetd intalled, running
identd installed, running
Ethernet interfaces:
lo/127.0.0.1 localhost.localdomain localhost
bond0/192.168.248.5 sgj-bd1.dom1.com sgj-bd1
eth2 / 10.10.10.10 sgj-bd1.hbone sgj-bd1
Netmask: 255.255.255.0
gateway: 192.168.248.1
dns: 192.168.248.2 / 192.168.248.3
dom1.com is a valid internal domain (replaced for security reasons)
eth2 and eth3 are hearbeats only, crossover, not routed.
eth0 and eth1 are the bonding interfaces.

As far as the second host, is exactly the same execept for the hostname and ipaddress.
hostname: sgj-bd2
OS: RHEL 5 (stock, not updated yet)
SG A.11.18 RHEL 5
xinetd intalled, running
identd installed, running
Ethernet interfaces:
lo/127.0.0.1 localhost.localdomain localhost
bond0/192.168.248.7 sgj-bd2.dom1.com sgj-bd2
eth2/10.10.10.11 sgj-bd2.hbone sgj-bd2

content of cmclnodelist on both nodes
sgj-bd1 root
sgj-bd2 root

Thanks to all for your kind help.

skt_skt
Honored Contributor

Re: Node xxxx is refusing Serviceguard communication

John Bigg
Esteemed Contributor

Re: Node xxxx is refusing Serviceguard communication

That is the sort of information I was meaning. You mention eth3 as a heartbeat but you do not include it's IP address or what is in /etc/hosts for this. Are there any other interfaces you have not mentioned? Maybe provide ifconfig or ip addr show output.
john123
Trusted Contributor

Re: Node xxxx is refusing Serviceguard communication

Can you please check the xinetd entries in the xinetd configuration file....
Once when we come across a similar problem like this it got solved by changing the entries in the inetd.conf file..
Erick Arturo Perez
Frequent Advisor

Re: Node xxxx is refusing Serviceguard communication

Sorry, i was very sick and I was unable to continue this thread.
but im back now.

I will repost /etc/hosts configs and some more information.

thanks,