Serviceguard
cancel
Showing results for 
Search instead for 
Did you mean: 

cmrunnode : Cluster is already running message

 
Giuseppe Gattiglio
Contributor

cmrunnode : Cluster is already running message

MC/Service Guard A.10.12 with patch PHSS_26182

If I reboot (not regular shutdown) a node
it comes up with the following message in /etc/rc.log:

cmrunnode : Cluster is already running on ..

and cmcviewcl shows the node as failed.
If, after that, I restart manually the node
with cmrunnode it comes up and join the cluster.

Does anyone have an idea about this
5 REPLIES 5
Rajeev Shukla
Honored Contributor

Re: cmrunnode : Cluster is already running message

Thats because when you reboot its a abrupt shutdown. The cluster doesn't go down in normal was and the Cluster manager assumes that a panic or TOC has happened and updates the cluster with a node fail message. And when the machine comes up you need to manual make this node join the running cluster.
So its always advisable to do shutdown...if not atleast halt the node and then reboot. Thats cmhaltnode.

Cheers
rajeev
Dietmar Konermann
Honored Contributor

Re: cmrunnode : Cluster is already running message

Hmm... usually the cmrunnode should work fine even after a "hard" reboot or reset. How long is your node timeout configured? Mabye the cluster is still reforming while your node comes up again?

Best regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Giuseppe Gattiglio
Contributor

Re: cmrunnode : Cluster is already running message

Thanks for all answers.

My problem began with patches installation (I had NODE_TIMEOUT to 300sec since weeks!)

I don't suppose is a problem of cluster reforming:
as I give "reboot" on node1, I see on node2 syslog the message
"cmcld: Turning off safety time protection since the cluster now consists of a single node. If ServiceGuard fails, this node will not automatically halt
1 nodes have formed a new cluster, sequence #26"

I also tried in putting a "sleep 5" into /sbin/init.d/cmcluster file, and then "reboot" but it didn't work.

Note that if I halt node1 with "shutdown -r", it joins the cluster.

Any help will be appreciated

Dietmar Konermann
Honored Contributor

Re: cmrunnode : Cluster is already running message

Indeed I believe that your 300 sec node timeout config is responsible for the problem.

The calculated failover time including the complete reformation calculates to more than 3300 sec on your cluster!

The setting does not make sense and should be corrected... the maximum recommended value is 30 secs. We usually recommend 5-8 secs, btw.

Best regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
avsrini
Trusted Contributor

Re: cmrunnode : Cluster is already running message

Hi Giuseppe,
From your input, if you give
reboot, the processes are not given proper time to shutdown. They are killed.
including cmcluster daemon.
Which is like TOC, and the cluster thinks there is some problem in the node and it disables this node joining the cluster.
When you reboot, cluster daemon may not be removing the
/var/adm/cmcluster/.cm_start_tie file. So you get the cluster already running error on the syslog file.


when you use shutdown, the daemon goes down properly, indicating the other node that
this node is going down. So when this node comes up, it joins the cluster.

So do a shutdown ( preferred method) than reboot to rebooting the servers.

Hope this clears your query.
Also grant some awards in form of points to people helping you solving the problem. This will encourage others to help you much. Hope you got the point.

Let us know if still problem exists. We can bring the cluster in debug mode and do some more troubleshooting.

Happy clustering.
Srini.




Be on top.