Operating System - OpenVMS
1827876 Members
1084 Online
109969 Solutions
New Discussion

Cluster Time out Question.

 
SOLVED
Go to solution
VMS Support
Frequent Advisor

Cluster Time out Question.

We currently have our cluster time outs set to 360 seconds.

This means (As you are aware) that if a node exits the other nodes in the cluster will hang for 3 minutes (360 seconds.

We are looking at reducing this value.
We have reliable network. A link beteen sites should never be down for more than 60 seconds.

Our cluster consists of Three nodes.
Two at one site (Production) and a third at the DR site.
This is a DTCS cluster.

What values do other people run with ?
Does anyone have an opinion on lowering these values ?
9 REPLIES 9
Wim Van den Wyngaert
Honored Contributor

Re: Cluster Time out Question.

We have 2 interbuilding clusters of 2 servers (and 1 q station). We use FDDI as an interconnect and this has a redundant fiber so we can resist 1 fiber failure.

RECNX is at 120 seconds.

If you have shadowing, make sure that shadow_*_TMO is higher than the recnx value.

Wim
Wim
Jan van den Ende
Honored Contributor

Re: Cluster Time out Question.


Oh my!
Where ARE you living!
Any place I know about 3 minutes only last 180 seconds! :-)

We also have FDDI between our two sites (7 KM apart)
To reduce cluster reconfig time, we have RECNXINTERVAL at 5 seconds.

We also have 100 Mb Ethernet.
"Networks" wants to upgrade to Gb, ... by REPLACING the FDDI.
But the (Cisco) Eth is configured using "Spanning Tree", with failover times of up to 45 secs...

That is why we are fighting to keep the FDDI, if only as a fallback interconnect during tree re-builds.

So yes, you can shorten your the Cluster Timeout period, as long as you make VERY sure it is longer than the potential network connection interrupt. Redundant ( & preferably very different) network connections are helpful.

hth

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Richard Brodie_1
Honored Contributor

Re: Cluster Time out Question.

I can't think of a good reason to have RECNXINTERVAL higher than your ~90% worst case network downtime. I thought 'last gasp' datagrams meant that it was relatively unusual to hit this timer though.

Jan: Old fashioned spanning tree is so last year: hold out for RSTP at the very least.
Volker Halle
Honored Contributor

Re: Cluster Time out Question.

re: last gasp datagram

Richard,

the 'last gasp datagram' is sent from the departing node, if it crashes (or shuts down), so the other nodes in the cluster won't have to wait RECNXINTERVAL before timing out the departed node. This does not help in case the network connection breaks.


The 'long' (=RECNXINTERVAL) hang should ONLY be seen, if a node is HALTed without a crash or shutdown or is just powered down or the network connection really brakes. If you see a long state transition during normal shutdown, then something is wrong and needs to be diagnosed.

Volker.
Peter Zeiszler
Trusted Contributor

Re: Cluster Time out Question.

On our environments where we have only 1 NIC card for IP & Decnet we have RECNXINTERVAL at 120. On the environments where we have 2 NICs (or more) we have RECNXINTERVAL set to 20.

The multiple NICs give us redundant connectivity for Cluster Communications. We normally have 1 NIC for IP, 1 NIC for Decnet, and 1 NIC for backup network.

Our environments are all with all 4 nodes at the same site but in 3 different rooms. We have redundant network paths.
VMS Support
Frequent Advisor

Re: Cluster Time out Question.

Thanks for all the feedback.

Sorry about the bit about 360 seconds = 3 minutes. I was entering data in imperial rather than metric ;-) ....

We have the timeout set to 180 seconds = 3 minutes. the high timeout is a throw back to some earlier network kit we used run.
I'm told the longest we should be out is now less than 60 seconds. Having read some of the feedback , I think we should look at going for 90 seconds (Half of what we have).

Regards
Kevin
Keith Parris
Trusted Contributor
Solution

Re: Cluster Time out Question.

I've operated multi-site disaster-tolerant clusters successfully with RECNXINTERVAL set at the default value of 20 seconds.

In my case, I was able to run with lower RECNXINTERVAL values because I had two completely separate and independent extended LANs connecting the systems at the two sites, so that a Spanning Tree reconfiguration, which seemed to typically take about 35-40 seconds in those days (which is greater than the default 20-second value for RECNXINTERVAL), would be unlikely to affect both LANs at once.

In the old days, there was a recommendation of 180 seconds for RECNXINTERVAL in disaster-tolerant clusters of the day because that was the time required for a GIGAswitch/FDDI (or one of its line cards) to reboot, as it would have to do after a firmware upgrade. (But even that figure became out-of-date, as with the latest firmware revisions, that time in practice actually increased to 210 seconds for the 4-port FDDI line card to reboot after a firmware upgrade.) Perhaps your figure came from a conservative doubling of this old recommendation after the old figure of 180 seconds proved insufficient at some point in the past.

As another poster pointed out, in addition to the original IEEE 802.1d Spanning Tree Protocol there is now the Rapid Spanning Tree Protocol, IEEE 802.1w, which aims for much shorter reconfiguration times -- on the order of seconds or less rather than 10s of seconds.

I highly recommend that anyone for whom cluster interconnect reliability is critical and where the LAN is used as a cluster interconnect implement LAVC$FAILURE_ANALYSIS. This feature has been in VMS since 6.0 and generates OPCOM messages whenever a piece of the cluster interconnect LAN breaks (or when it is repaired). The EDIT_LAVC.COM tool from the V6 Freeware directory [KP_CLUSTERTOOLS] will help you set this up with minimal effort. LAVC$FAILURE_ANALSYS is documented in the appendices of the OpenVMS Cluster Systems Manual, and I had an article on the topic in the OpenVMS Technical Journal, V2 - see http://h71000.www7.hp.com/openvms/journal/v2/index.html

Once LAVC$FAILURE_ANALYSIS is in place, then you will have a written record in console output and in the OPERATOR.LOG file, with timestamps, of all LAN outages, and thus their durations. If you run with this enabled for a while and see that your expectation of a maximum outage length of 60 seconds is what you're really seeing in practice (and you might even consider inducing some of the likely failure types during off hours while you still have RECNXITNERVAL set to 360 to see how the network really behaves), then you could lower RECNXINTERVAL to a bit longer than your maximum outage times with relative safety.

I once worked with a stock exchange which needed to run with RECNXINTERVAL=10 seconds and we used this technique to identify a LAN outage problem which turned out to be lasting 11 seconds (and simultaneously across three supposedly-independent LANs) and thus causing much grief. Armed with the timestamped info, the proof was available with which to, uh, enlighten the understanding of the network folks.
Wim Van den Wyngaert
Honored Contributor

Re: Cluster Time out Question.

Don't forget to lower the shadow_mbr_tmo value if you lower recnxinterval. Otherwise your cluster will continue but not your shadow sets and thus not your applications.

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Cluster Time out Question.