SG problem?

Boonchu Ngampairoijpibu_1 · ‎06-08-2001

I got a bunch of messages from syslog in every 30 mins.

Jun 8 08:12:49 cmcld[2549]: Attempting to adjust cluster membership

Jun 8 08:12:52 cmcld[2549]: Obtaining Cluster Lock

Jun 8 08:12:53 cmcld[2549]: Turning off safety time protection since the cluster

Jun 8 08:12:53 cmcld[2549]: now consists of a single node. If ServiceGuard

Jun 8 08:12:53 cmcld[2549]: fails, this node will not automatically halt

This is my configuration. I have one pri lan, one standby lan, and another to be crossover lan between two nodes. I brought pri lan to be a heartbeat also. I have do nothing to crossover lan.

flags: 12 (single cluster lock)
heartbeat interval: 1.00 (seconds)
node timeout: 2.00 (seconds)
heartbeat connection timeout: 4.00 (seconds)
auto start timeout: 600.00 (seconds)
network polling interval: 2.00 (seconds)

Someone told me to tune up node timeout from 2 seconds to 5+ seconds. I agreed with this point that will elimiate the syslog message. However, I would like to configure the crossover lan cable to have an another heartbeat running. Is it possible? If so, I still want to keep node timeout to 2 second, and add crossover to have an second heartbeat, does it eliminate the problem?

SG experts, pl let me know.

Boonchu Ngampairoijpibul

Boonchu Ngampairoijpibul

James R. Ferguson · ‎06-08-2001

Hi:

The general guideline is to keep the NODE_TIMEOUT at, or slightly above 5-seconds. This gives the 'cmcld' daemon reasonable assurance of getting processor cycles to accomodate its needs.

...JRF...

Carsten Krege · ‎06-08-2001

The widely agreed opinion is that NODE_TIMEOUT should be in the range between 5-8 seconds. This is a general statement which is independent from you SG configuration. The reason why we recommend this, is that we feel that this setting will effectively avoid cluster reformations triggered by short hic-ups of the system (system hangs that starve out SG's main process cmcld from getting CPU), but still guarantees a short failover time.

We do not recommend to increase the NODE_TIMEOUT beyond 8s, even if the cluster still runs into reformations. In this case it is definitely necessary to identify the root cause of the problem.

From the messages you get, we cannot deduce the cause of the cluster reformation. Basically we deal with the following problems:

1) network problems: cmcld doesn't receive heartbeats and can therefore not update the safety timer
2) system hangs and related: cmcld doesn't get CPU time to update the safety timer

Adding a private heartbeat network (Yes, crossover cables ARE supported!) helps for problems of the category 1 only.

If cmcld is not getting CPU time (category 2), the new heartbeat network will not help.

My recommendation is to do both: Adding the crossover lan and to increase NODE_TIMEOUT.

Carsten

-------------------------------------------------------------------------------------------------
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG

Categories

Company

Local Language

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

SG problem?

SG problem?

Re: SG problem?

Re: SG problem?