1834114 Members
2345 Online
110063 Solutions
New Discussion

CM Cluster Error

 
SOLVED
Go to solution
Alberto Hurtado
Frequent Advisor

CM Cluster Error

please send me information about this errors i have two hp-9000 k570 in a Cluster of MC-Service Guard. The system did not go away down and either was shutdown or reboot.
Your aid is thanked
______________________________________________
Sep 11 13:47:08 mvi902 cmcld[2438]: Communication to node mvi903 has been interrupted

Sep 11 13:47:08 mvi902 cmcld[2438]: Node mvi903 may have died

Sep 11 13:47:08 mvi902 cmcld[2438]: Attempting to form a new cluster

Sep 11 13:47:13 mvi902 cmcld[2438]: 2 nodes have formed a new cluster, sequence#101

Sep 11 13:47:13 mvi902 cmcld[2438]: The new active cluster membership is: mvi903(id=2), mvi902(id=1)

Sep 11 13:47:15 mvi902 vmunix: mpc_bindlwp: Overriding conflicting mandatory binding!

Sep 11 13:47:15 mvi902 vmunix: mpc_bindlwp: Migrating process 491 from processor 1 to processor 0!

Sep 11 13:47:39 mvi902 vmunix: mpc_bindlwp: Migrating process 499 from processor 2 to processor 0!

Sep 11 13:47:39 mvi902 vmunix: mpc_bindlwp: Overriding conflicting mandatory binding!

Sep 11 13:47:40 mvi902 above message repeats 2 times

Sep 11 13:47:39 mvi902 vmunix: mpc_bindlwp: Migrating process 499 from processor 2 to processor 0!

Sep 11 13:47:49 mvi902 vmunix: mpc_bindlwp: Overriding conflicting mandatory binding!

Sep 11 13:47:49 mvi902 vmunix: mpc_bindlwp: Migrating process 567 from processor 3 to processor 0!

Sep 11 16:20:25 mvi902 cmcld[2438]: Communication to node mvi903 has been interrupted

Sep 11 16:20:25 mvi902 cmcld[2438]: Node mvi903 may have died

Sep 11 16:20:25 mvi902 cmcld[2438]: Attempting to form a new cluster

Sep 11 16:20:29 mvi902 cmcld[2438]: Obtaining Cluster Lock

Sep 11 16:20:30 mvi902 cmcld[2438]: Turning off safety time protection since the cluster

Sep 11 16:20:30 mvi902 cmcld[2438]: now consists of a single node. If ServiceGuard

Sep 11 16:20:30 mvi902 cmcld[2438]: fails, this node will not automatically halt

Sep 11 16:20:32 mvi902 cmcld[2438]: Attempting to adjust cluster membership

Sep 11 16:20:35 mvi902 cmcld[2438]: Enabling safety time protection

Sep 11 16:20:35 mvi902 cmcld[2438]: Clearing Cluster Lock

Sep 11 16:20:37 mvi902 cmcld[2438]: Timed out node mvi903.

Sep 11 16:20:37 mvi902 cmcld[2438]: Attempting to adjust cluster membership

Sep 11 16:20:41 mvi902 cmcld[2438]: Clearing Cluster Lock

Sep 11 16:20:46 mvi902 cmcld[2438]: 2 nodes have formed a new cluster, sequence #104

Sep 11 16:20:46 mvi902 cmcld[2438]: The new active cluster membership is: mvi902 (id=1), mvi903(id=2)
10 REPLIES 10
Geoff Wild
Honored Contributor

Re: CM Cluster Error

Looks like you lost your heartbeat....

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
melvyn burnard
Honored Contributor

Re: CM Cluster Error

Your nodes are losing contact wit each other through their heartbeat networks, possibly due to network overload or other networking issues.
You would be advised to locate htese network problems and fix them.
As an aside, what are the configuration ssettings for HEARTBEAT_INTERVAL
and
NODE_TIMEOUT

These may have to be increased.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Zigor Buruaga
Esteemed Contributor

Re: CM Cluster Error

Alberto Hurtado
Frequent Advisor

Re: CM Cluster Error

I have a question... why service guard make the cluster reformation when a LAN card failed if ia had two lan card:
mvi902> lanscan
10/4/8.1 0x080009BA3C7C 4 UP lan4 snap4 2 ETHER Yes 119

10/12/6 0x0060B0838AEB 1 UP lan1 snap1 4 ETHER Yes 119

I guess shuold be go for the secondary card
Geoff Wild
Honored Contributor

Re: CM Cluster Error

You may have a second card - but if it isn't configured as a STANDBY - then it won't be used.

Can you post your /etc/cmcluster/"cluster".conf file?

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Zigor Buruaga
Esteemed Contributor
Solution

Re: CM Cluster Error

Hi,

In your cluster config file you define which networks are "critical". We have 4 different IP's for each node and only two of them ( used by clients in different sites ) are "critical". When one of this critical networks becomes unavailable ( HEARTBEAT fails ), then the package "jumps" to the other node, in order to be available to all clients.

Hope this helps, and sorry for my initial confusion.
Kind regards,
Zigor
Sritharan
Valued Contributor

Re: CM Cluster Error

Hi,

Please send us your cmcluster configuation file,to check whether the heartbeat have been configured or not.

Please attach the cmscancl -v output also

Thanks & Regards
Sri
Known is a drop...unknown is an ocean -> quote from a movie
Sridhar Bhaskarla
Honored Contributor

Re: CM Cluster Error

Hi,

Couple of years back, we experienced the same issue where "identd",as used by sendmail seemed to have been causing the network cards to freeze for a short time intermittently. We remedied the issue by commenting out ident in /etc/inetd.conf (inetd -c) and increasing the HEARTBEAT_INTERVAL and NODE_TIMEOUT values in the cluster configuration file to 12 seconds.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Alberto Hurtado
Frequent Advisor

Re: CM Cluster Error

here the document with the command:
mvi902 /-> cmscancl -n mvi902
Sridhar Bhaskarla
Honored Contributor

Re: CM Cluster Error

Hi Alberto,

You already have .2 defined as the "default" gateway on the box. Do you know why you have so many static routes using the same gateway?.

You may want to get rid of them though it may not necessarily be the problem.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try