cancel
Showing results for 
Search instead for 
Did you mean: 

member does not join cluster

Vladimir Fabecic
Honored Contributor

member does not join cluster

Hello
Having very strange problem:
It is four member cluster V5.1B PK5.
Member 3 had to be rebooted because of change kernel parameter.
Member was properly shuted down but when booted, stops at "attempting to form or join the cluster", and repeats this message.
Other three nodes are working OK.
On other nodes there are following messages:
WARNING: ics_socket_event: error 60 on channel 0, assume node 3 is down
CNX MGR: communication error detected for node 3
CNX MGR: delay 121 sec 0 usecs
There are two cluster interconnect switches (with interlink). I checked both of them and they are OK.
While booting all network interfaces go up.
When watching network traffic (from other node) with tcpdump, it shows that member 3 is sending broadcast to subnet (port 900 as it should be).
But it looks like it gets no answer or something.
There were no error messages in log files, everithing worked OK before.
It worked OK for years with not a single problem.
I spent few hours for checking hardware and it looks like hardware is OK.
I even removed member from cluster and recreated it. The error message is the same and does not join the cluster.
Before removal of member I even tried to "break" netrain for cluster interconnect and used single NICes.
Any suggestions?
In vino veritas, in VMS cluster
4 REPLIES
Kapil Jha
Honored Contributor

Re: member does not join cluster

which kernel parameter you changes.

from your post it seems that u remove member 3 from cluster (how??) and changed the kernel paramter, reboot and asking it to join back.

my understanding is correct ??

what about netrain now?

BR,
Kapil+
I am in this small bowl, I wane see the real world......
Vladimir Fabecic
Honored Contributor

Re: member does not join cluster

Kernel parameters that were changed are parameters requested by Oracle 10.
Nothing relevant.
I also tried to reverse it, but no luck.
After few hours of testing and trying to put member back in cluster, I removed member from cluster and readded it.
During first boot (readded) member 3 stops at same point.
In vino veritas, in VMS cluster
Pieter 't Hart
Honored Contributor

Re: member does not join cluster

do I understand corerectly:
The cluster interconnect is using NIC's not MC?
NIC's are using netrain also for ICS?
There are two switches each with one netrain member connected?
These switches are interconnected?
The cluster ICS packets are received by the other hosts? which is checked using tcpdump?

Can you also use tcpdump to check if any answer is sent back?
with this member down, what's the response of a ping to this node's adress?
(maybe another node on the network is using wrong address?)
what do the switch's logfiles say about this link?
Vladimir Fabecic
Honored Contributor

Re: member does not join cluster

Pieter,
You understood correctly.
No another node on the network is using wrong address.
No problem with switches.
Problem was caused by samba swat (port 901).
In vino veritas, in VMS cluster