Re: Cluster reconnection interval (RECNXINTERVAL)

EWONG · ‎10-04-2006

Hi,

We will implement the DWDM network to replace the existing FDDI, so could anyone can suggest the best value for the RECNXINTERVAL (current setting is 60), since the network failover can be trim down to less than 2 seconds with the DWDM.

Also, we are using the remote volume shadowing, so whether the SHADOW_MBR_TMO can be trim down and what is the best value ?

Many thanks.

Ian Miller. · ‎10-04-2006

read the wise words of Kieth Parris in his presentations at
http://www2.openvms.org/kparris/

____________________
Purely Personal Opinion

comarow · ‎10-08-2006

It's a trade off. When you decrease the number, you are more likely to a clue exit.

If increased, cluster reconfigurations will take longer.

I would not immedicately reduce it till you see the stability of the new connection. If the cluster timeouts are not bothering you now, don't mess with it.

Even in a CI cluster, the default is 10.

Thomas Ritter · ‎10-08-2006

We run a DT cluster with DWDM.
We have RECNXINTERVAL at 180 and SHADOW_MBR_TMO at 240. Use host based shadowing and have about 70 disks mounted.

Keith Parris · ‎10-13-2006

Before modifying RECNXINTERVAL, I'd measure the actual duration of interruptions due to failures or faults in your own environment. Although the DWDM equipment may be able to fail over in 2 seconds, bridges or switches involved in the LAN you use as a cluster interconect may have Spanning Tree reconfigurations which take much longer than that. In my experience, if the original Spanning Tree protocol (IEEE 802.1d) is in use, a reboot of a switch or bridge can cause a pause of 35-40 seconds in forwarding packets. With the new IEEE 802.1w Rapid Spanning Tree Reconfiguration protocol, in theory you can reduce that to the level of a few seconds or even sub-second times.

So you would need to measure the actual length of traffic disruption as you trigger various failure events (such as rebooting a switch or disconnecting a link to the DWDM).

In order to be able to accurately measure the duration of communications disruptions,
I'd enable LAVC$FAILURE_ANALYSIS if it's not already in place (see my article in the VTJ Volume 2 at http://h71000.www7.hp.com/openvms/journal/v2/articles/lavc.html). That will generate OPCOM messages both when any piece of the LAN configuration you're using as a cluster interconect fails, and again when it starts working again. With this, you'll be able to get accurate timestamps of how long a disruption appears from the VMS cluster's viewpoint.

Once you know how long a disruption various real failures generate in practice, then it's a simple matter to choose a value for RECNXINTERVAL which is larger than the longest of those periods.

The recommendation that SHADOW_MBR_TMO be at least 10 seconds larger than RECNXINTERVAL included the underlying assumption that a VMS node at the remote site is MSCP-serving the disks, and so you don't want to throw a disk out of the shadowset before you would make a decision about whether to throw out the VMS node serving that disk. If you have Fibre Channel linked between sites and either don't use MSCP-serving (or, better yet, have it enabled, but it is only used as a backup path), then your choice of SHADOW_MBR_TMO would be independent of RECNXINTERVAL, and more dependent on the duration of a potential outage on the SAN, rather than the LAN.

> Even in a CI cluster, the default is 10.

The default value for RECNXINTERVAL is 20 seconds.

The recommendation that one often sees in disaster-tolerant VMS clusters of 180 seconds for RECNXINTERVAL was originally based on the time required to reboot a GIGAswitch/FDDI. (In real life, after firmware updates and the introduction of newer linecards, the actual need grew to 210 seconds (the time required to reboot a 4-port FDDI line card).

With dual (completely-independent, not connected together, so that both won't undergo a spanning-tree reconfiguration at the same time) inter-site LAN links, and dual LAN adapters in each VMS node, it is possible to run a disaster-tolerant cluster at the default RECNXINTERVAL value of 20 seconds. Some even run at 10 seconds.

There is some additional detail in my user-group presentation entitled "OpenVMS Connection Manager and the Quorum Scheme" at http://www2.openvms.org/ as well as more detail in the older ones entitled "VMS Cluster State Transitions in Action" and "Understanding VAXcluster State Transitions" at http://www.geocities.com/keithparris/

comarow · ‎10-13-2006

One thing to remember.
When using a SAN based cluster,
the systems can lose connectivity,
while the SAN disk maintains connectivity.

This is too common.

This can cause in a short network bump, a system that has the quorum disk to say
I am the cluster and ALL the other nodes
will Clue$EXIT

Nic Clews · ‎10-17-2006

In reply to where systems have SAN / quorum disk and a potential transient network...

The system that will cluexit is based on a number of factors. of course it is the voting, and the majority will reconfigure (post REXNXINTERVAL) which whatever member(s) are visible to each other, and may or may not have quorum. If a set of systems have quorum, then sure, when the "lesser removed" members reconnect they will voluntarily leave the cluster. defined behaviour.

if no-one has quorum, then processing halts, until it is regained. in the case of a quorum disk, if both halves have access, the continual updating of the quorum disk file will prevent the vote being counted. if one of the systems stops reattempting configuration (and updating the SAN based file) then the remaining member will validate that quorum disk file, take its votes, and complete reconfiguration without the other member. no matter what happens it can only rejoin as a new member, not in its "removed" state.

complex yes.

This of course is the point that you would halt one of the systems. just remember that when a system has a vote it is effectively given equal rights to be a member of, or be the cluster, to any other member, but other factors, in some cases race conditions, in others ID precedence, possibly access to quorum devices, become factors in determining how a reconfiguration situation is ultimately resolved. the process is multi layered.

There are rare possible situations that non voting nodes can prevent voting systems from properly configuring but this is when multiple interconnects are involved that have complex permuations of failure.

overall the advice is good, yes it's a trade off, as is most things, but good practice is to take an understanding of what you need from the service of the VMS systems, and draw out on paper interconnect failures, and how you'd expect, or want your systems to recover in any situation, then control the parameters and voting etc. accordingly.

It is also wise to figure in private interconnects avoiding switches, and of course that if a network card fails as a cluster interconnect, if it has also failed as access to the outside world that decisions can be complex as to what should survive what type of failure.

EWONG · ‎10-18-2006

Thanks all of your valuable information.

In our configuration, it is a 4 member clusters and connecting to RA8000 disk with remote volume shadowing in two sites. There is no quorum disk. According to our network side, all the network component has resilience and any component failover would require milli-second or just up to maximum of 2 seconds. Therefore, just want to check whether the recnxinterval can be trimmed down from 60 to 15 seconds in cluster members in order to enhance the system availability.

Many thanks.

Volker Halle · ‎10-18-2006

Edmond,

consider the situations when RECNXINTERVAL is actually used:

Only if one of the systems abruptly halts without sending the 'last gasp' message, then the other nodes would have to wait for RECNXINTERVAL seconds before removing that node from the cluster.

If your network recovers or fails over to alternate pathes within 2 seconds, the connection manager won't probably even notice this.

If a node crashes or shuts down, a 'last gasp' message is sent, causing the node to be removed immediately.

Note that there could be extreme cases in OpenVMS, like deleting an extremely large lock/resource tree at elevated IPL, where the system may not be able to send/receive SCS hello messages for some time. You wouldn't want that to cause CLUEXIT crashes (I have seen this with V7.2-1, 30000 locks and RECNXINTERVAL=20).

Volker.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Cluster reconnection interval (RECNXINTERVAL)

Cluster reconnection interval (RECNXINTERVAL)