Re: NODE_TIMEOUT

Jonathan H. · ‎12-01-2004

This weekend I had one side of my 2 node cluster TOC- in q4 it suggested that the cause might be that the NODE_TIMEOUT period is to low. It suggeted that I set it at 8 seconds. I currently have 4-875 CPU's in the same cell on one side and on 4-650 CPU's in the same cell on the other side.

Is the above setting correct for my systems configuration. Also should I set the heartbeat_Interval up.

John Poff · ‎12-01-2004

Hi,

We have our NODE_TIMEOUT set for 8 seconds and our HEARTBEAT_INTERVAL set for 2 seconds. Those values seem to work well and we haven't had any random TOCs when the network was busy.

JP

A. Clay Stephenson · ‎12-01-2004

Well, unless I use The Force I have no way of knowing what your current settings are so that makes it a little difficult to make intelligent comments.

I can say that I use a HEARTBEAT_INTERVAL of 1000000 (1 s) and a NODE_TIMEOUT of 8000000 (8 s) and have never had a TOC; of course, I've never had a MC/SG failover in over 5 years that was not manually (and intentionally) triggered.

If you are using the default NODE_TIMEOUT of 2 s, you are really asking for incidents like yours. I do assume you have multiple HEARYBEAT_IP's defined.

If it ain't broke, I can fix that.

Jonathan H. · ‎12-01-2004

I have the HEARTBEAT_INTERVAL set at 3000000
and the NODE_TIMEOUT set at 6000000

We are currently running several clusters throughout the country and have never had this problem. Until we upgraded the CPU's on one side. Do your systems have the same size CPU's?

John Poff · ‎12-01-2004

I don't think it is so much a function of how fast your CPUs are, but the combination of your settings. With HB at 3 seconds and TO at 6 seconds, that means you only have to miss two heartbeats and it is TOC time. Our settings of HB at 2 and TO at 8 means you have to miss 4 heartbeats. With Clay's settings you have to miss 8 heartbeats.

JP

A. Clay Stephenson · ‎12-01-2004

In your case, you are running the minimum allowed value for NODE_TIMEOUT of 2 X HEARTBEAT_INTERVAL which puts you on the hairy edge eventhough your total timeout (6 seconds) seems reasonable. You are essentially as vulnerable and someone running the absolute minimum of NODE_TIMEOUT = 2 s and HEARTBEAT_INTERVAL of 1 s. The speed of the CPU's should have little to do with this and indeed it is quite common in MS/SG land to have very asymetrical servers making up a cluster especially if old klunkers are used for failover.

My rule (and it's just mine) is to never go below 3 heartbeat misses but obviously I prefer more frequent heartbeats but tolerate more misses.

Finally, just because you (and q4) think this is the reason for the TOC doesn't mean that it is. For example, an operator might have pushed the little button.

If it ain't broke, I can fix that.

Stephen Doud · ‎12-02-2004

Suggest:
NODE_TIMEOUT = 8 seconds
HEARTBEAT_INTERVAL = 1 second
(sends up to 8 heartbeat packets before NODE_TIMEOUT expires)

Consider:
Create redundant heartbeat paths:
Review the cluster configuration file - look for STATIONARY_IP. If this title is related to an ethernet NIC, change it to HEARTBEAT_IP.
Then, with the cluster down, perform
# cmapplyconf -C

-StephenD.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: NODE_TIMEOUT

NODE_TIMEOUT

Re: NODE_TIMEOUT

Re: NODE_TIMEOUT

Re: NODE_TIMEOUT

Re: NODE_TIMEOUT

Re: NODE_TIMEOUT

Re: NODE_TIMEOUT