Operating System - HP-UX
1834458 Members
2667 Online
110067 Solutions
New Discussion

Re: Package goes down on the first node if the second node has been shutdown

 
Dmitry Skutin
Occasional Advisor

Package goes down on the first node if the second node has been shutdown

Hi all,

It's strange for me, but when I shutdown the second node all packages on the first node has been stopped.

what I did:

1. Shutdown the second node with command
reboot -h -s

2. On the first node in the syslog.log I saw following messages:

cmcld: Communication with node rumsla32 has been interrupted
cmcld: Node rumsla32 may have died
cmcld: Attempting to form a new cluster
cmcld: Beginning standard election
cmclconfd[19217]: Updated file /var/adm/cmcluster/frdump.cmcld.3 for node rumsla31
cmcld: Obtaining Cluster Lock
cmcld: Turning off safety time protection since the cluster
cmcld: may now consist of a single node. If Serviceguard
cmcld: fails, this node will not automatically halt
cmcld: This will not affect the behavior of Package Failfast
cmcld: or Service Failfast. If such a package or service fails,
cmcld: safety timer will be re-enabled and this node will
cmcld: automatically halt.
cmcld: lan900 failed
cmcld: Subnet 192.168.0.0 down
cmcld: Subnet 192.168.0.0 in package adsdb1 is down.
cmcld: Executing '/etc/cmcluster/adsdb1/adsdb1.sdf.sh stop' for package adsd1
cmcld: Subnet 192.168.0.0 in package condb1 is down.
cmcld: Executing '/etc/cmcluster/condb1/condb1.sdf.sh stop' for package cond1
cmcld: Subnet 192.168.0.0 in package odsdb1 is down.
.....
cmcld: All cluster monitoring LAN interfaces have failed

Is anybody know what wrong with my configuration?

Thanks a lot.
7 REPLIES 7
Mark McDonald_2
Trusted Contributor

Re: Package goes down on the first node if the second node has been shutdown

Did you run cmhaltnode first to safely remove it from the Cluster?
Dmitry Skutin
Occasional Advisor

Re: Package goes down on the first node if the second node has been shutdown

No. Only reboot.
Matti_Kurkela
Honored Contributor

Re: Package goes down on the first node if the second node has been shutdown

The first node is turning off the safety time protection, so it seems to be successfully transitioning to single-node operation.

But then lan900 fails for some reason, and I'm guessing it is used for the subnet 192.168.0.0, which is monitored by the two packages mentioned in the listing. So the package shutdown might be caused by the failure of lan900.

The big question is, why did lan900 fail at that time?

Is the monitoring of lan900 on the first node somehow dependent on the availability of the second node? If so, that's the problem.

MK
MK
Eric SAUBIGNAC
Honored Contributor

Re: Package goes down on the first node if the second node has been shutdown

Bonjour,

First thing, not straight relevant, but if you want to shutdown a node with "reboot" and not with "shutdown", first halt cluster layer before (cmhaltpkg, cmhaltnode). It is safer, because the cluster will be advertised that the node will leave the cluster.

Now, something that could explain why all packages on second node went done : is subnet 192.168.0.0 based on a cross-over ethernet cable ? If so, when one node goes done the ethernet connexion also goes done.

An other question : you have configured in the packages subnet 192.168.0.0 to be monitored by MCSG. Is it really necessary ? For example if is this subnet is only used for heartbeat, it is not necessary to monitor it. Just configure your cluster to have heartbeat on all subnets ...

Eric
Eric SAUBIGNAC
Honored Contributor

Re: Package goes down on the first node if the second node has been shutdown

Oups ... did not carefully read you initial post --> lan900 means APA. So I would be very suprised if you configured it with cross-over cables. Sorry :-(

Do you meat APA requirements for use with MC/SG ? see http://docs.hp.com/en/J4240-90035/J4240-90035.pdf chapter 7

Eric
Dmitry Skutin
Occasional Advisor

Re: Package goes down on the first node if the second node has been shutdown

Yes, we met Serviceguard's requirements for APA as lan900 configured as FEC_AUTO.

Unfortunately this cluster was configured not by myself, so I don't know some details.

And I would not like to check now how subnet 192.168.0.0 is configured (with cross-over or not) because it will require to turn off ports what is not possible now.

So I will remove this subnet from monitored list and will observe how it will affect in the future.

Eric, thanks a lot.
Stephen Doud
Honored Contributor

Re: Package goes down on the first node if the second node has been shutdown

Although you should use cmhaltnode before running shutdown on rumsla32, whatever the 192.168.0.0 network is used for, the network on rumsla31 should not go down when rumsla32 is halted. Although removing the LAN monitor for subnet 192.168.0.0 is a workaround to prevent the package from halting, you should not have to do that. Best to investigate just how the lan900 network is physically wired etc and correct the real problem, particularly if that subnet is truly a dependency of the package.