Re: Serviceguard second node fails to join cluster after switch package

Frank de Vries · ‎05-24-2007

We have a productive cluster with 2 nodes hpux 10.20 (It is old yes I know).
It was working fine, but we had a scsi reset lately and a panic on one node called oradb2. We change package to run on second node, because it is more stable.
However, this works fine, the first (original) node now refuses to join the cluster.
We get info in syslog but it is hard to see what it means.

Please see syslog extract from the point where I launch cmrunnode -v oradb2 (first node)

May 24 18:20:29 oradb2 CM-CMD[1111]: cmhaltnode -f -v oradb2
May 24 18:20:49 oradb2 CM-CMD[1112]: cmrunnode -v oradb2
May 24 18:20:49 oradb2 cmclconfd[1113]: Command execution message
May 24 18:20:49 oradb2 cmcld[1114]: SNMPsubagent is not up. Calculate standard number of threads needed.
May 24 18:20:49 oradb2 cmcld[1114]: Reserving 83 Kbytes of memory and 17 threads
May 24 18:20:49 oradb2 cmcld[1114]: Network interface lan2 has a different link level address
May 24 18:20:49 oradb2 cmcld[1114]: than the one configured. Proceeding.
May 24 18:20:49 oradb2 cmcld[1114]: Network interface lan1 has a different link level address
May 24 18:20:49 oradb2 cmcld[1114]: than the one configured. Proceeding.
May 24 18:20:49 oradb2 cmcld[1114]: Network interface lan0 has a different link level address
May 24 18:20:49 oradb2 cmcld[1114]: than the one configured. Proceeding.
May 24 18:20:49 oradb2 cmcld[1114]: SNMP subagent_fifo does not exist.
May 24 18:20:49 oradb2 cmcld[1114]: Starting cluster management protocols.
May 24 18:20:49 oradb2 cmcld[1114]: Attempting to form a new cluster
May 24 18:22:52 oradb2 last message repeated 2 times
May 24 18:23:53 oradb2 cmcld[1114]: Attempting to form a new cluster
May 24 18:24:43 oradb2 xntpd[676]: offset 0.005370 freq 0.49660 comp 2
May 24 18:24:55 oradb2 cmcld[1114]: Attempting to form a new cluster
May 24 18:25:56 oradb2 cmcld[1114]: Attempting to form a new cluster
May 24 18:26:58 oradb2 cmcld[1114]: Attempting to form a new cluster
May 24 18:29:01 oradb2 last message repeated 2 times
May 24 18:30:02 oradb2 cmcld[1114]: Attempting to form a new cluster
May 24 18:30:49 oradb2 cmcld[1114]: Cluster formation failed
May 24 18:30:49 oradb2 cmcld[1114]: Reason: Ran out of time for automatically joining a cluster
May 24 18:30:49 oradb2 cmcld[1114]: This node (oradb2) has ceased cluster activities.
May 24 18:30:49 oradb2 cmcld[1114]: Daemon exiting

Luckily the package is running on the one node.
It is not ideal so we like to get it resolved.
I cannot test to run the packages back for a while until I have a planned window to do an outage. So I first investigate and await your answers. Thanks

Look before you leap

melvyn burnard · ‎05-24-2007

well if you look at the log, it is telling you that all of the lans in the configuration on that node seem to have had a change of MAC address compared to what is in the cluster binary.

Use lanscan to check the lan MAC addreses, then use cmviewconf to check the mac addresses that Serviceguard thinks they should be.

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Srikanth Arunachalam · ‎05-24-2007

hi,

Its very much evident that the problem is with the error
"Network interface lan1 has a different link level address".

(1) Please check the entry of lan1 on /etc/rc.config.d/netconf.

(2) Try to linkloop the mac address of lan1, you will get the address from output of netstat -rn.

(3) If the result is not "OK", then unplumb the lan1 network interface and try to run the cmrunnode.

Hope this helps.

Thanks,
Srikanth

Frank de Vries · ‎05-28-2007

Response not quite on the level, but then this is a complex topic.
We cannot use Cmviewconf, maybe it
is our version. We use cmquerycl.
The mac addresses to my mind are not declared to Mc/Serviceguard at all.
It defeats the idea to change a faulty
network card, the cluster should still form.
Hopefully better next time guys :)

Look before you leap

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Serviceguard second node fails to join cluster after switch package

Serviceguard second node fails to join cluster after switch package