Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

VMS Cluster and IP Subnetting

VMS Cluster and IP Subnetting

Hi,

We have 2 sites that each run on different IP sub nets.

We are now going to implement an OpenVMS cluster between the 2 sites.

The first site will be in one subnet, and the second site in another subnet. Will this cause problems for the VMS cluster, and additional
routing overhead ? We do have an IP alias. Are there any general rules/ best business practises for configuring TCP/IP in a cluster
environment ?

With regards
Andrew
5 REPLIES
Bojan Nemec
Honored Contributor

Re: VMS Cluster and IP Subnetting

Hi,

Cluster protocol is a totaly independent protocol. But, when runing over ethernet, must run in a local area network! So, no routers are allowed between two cluster nodes!

Every cluster node (better every network card) has its own IP address, so they can be in different subnets. The cluster IP alias is one IP address, so it can be in only one subnet.

Bojan
Wim Van den Wyngaert
Honored Contributor

Re: VMS Cluster and IP Subnetting

Andrew,

If the 2 sites share the routing database, you need to define the 2 default gateways.
Each node will select it's own out of the 2 when starting up.

Wim
Wim
Keith Parris
Trusted Contributor

Re: VMS Cluster and IP Subnetting

As noted, for the cluster members to communicate, you'll need to have a LAN (or emulated LAN) connection between the sites. (The SCS protocol uses multicast Hello messages for the nodes to discover each other and also to track the status of communications paths on an ongoing basis.) This connection can be achieved by either enabling bridging between sites, or by setting up a VLAN on the routers that emulates a LAN.

Since you'll have to provide a bridged (or VLAN) connection betwenn sites for the cluster communications anyway, it should be fairly easy to put all cluster nodes in the same IP subnet.

This is quite non-intuitive to the networking folks, but they need to understand that the cluster is really a single entity, even though it may be spread across multiple sites.

As noted by another poster, the ordinary IP alias mechanism uses a fixed IP address, so you'd need all nodes to be in the same subnet for that to cover nodes at both sites.

For the DNS alias based on Load Broker / Metric Server, which works by responding to a DNS request with a list of IP addresses (with the least-busy node in the cluster listed first), then I would think it might work for the IP addresses returned to be in different subnets.

But if you use failSAFE IP, where an IP address can fail between nodes, you would need all the nodes to be in the same subnet if you wanted to use failSAFE IP to fail over between nodes at different sites.

For more info, see the article entitled "Configuring TCP/IP for High Availability" at http://h71000.www7.hp.com/openvms/journal/v2/index.html
Anton van Ruitenbeek
Trusted Contributor

Re: VMS Cluster and IP Subnetting

Andrew,

We have a multisite cluster using multiple subnets.

First, you need to have a multiple TCPIP$ROUTE file. For each node one. This because of your default gateway. This must be difference per site. Done by a SYSTEM logical TCPIP$ROUTE pointing at the local file, not clusterwide the same ! (eg: SYS$SYSROOT:[SYSEXE]TCPIP$ROUTE.DAT). If you can arrange this logical per site it can be done as SYS$SYSTEM:_TCPIP$ROUTE.DAT etc.)
Secondly (as Keith mentioned) you need at least one connection between the sites that is fast enough and this must be a LAN. So no routing. If you are using brouters (bridges/routers) you need to setup bridging enabled. SCS and MSCP needs to go over the whole network. These protocols aren't routable !
Third, as I pointed in 2. The latency must not exceed several milliseconds. I don't know the number but HP can provide this for you. On darkfiber the distance is about max. 500 miles (1000 miles roundtrip). The RECNXINTERVAL (SYSGEN) must be calculated for it. Note: THIS IS NOT THE LATENCY VALUE, but is a value that is a multiple value that must be enlarged for the latency.

AvR
NL: Meten is weten, maar je moet weten hoe te meten! - UK: Measuremets is knowledge, but you need to know how to measure !
Keith Parris
Trusted Contributor

Re: VMS Cluster and IP Subnetting

A VMS cluster configuration needs to meet the Rule of Total Connectivity, which says that each and every VMS node in the cluster must be able to talk to each and every other VMS node directly, without having to go through another node. (Bridged LAN connections are OK here, as they appear to be "direct".)

If the Rule of Total Connectivity is broken by a communications failure, unless the failure is repaired fairly quickly, a cluster state transition will need to occur to pick a subset of nodes to continue which will again meet the Rule of Total Connectivity.

RECNXINTERVAL controls how long (in units of seconds) the nodes will wait after detecting such a failure, in hopes the network will start working again, before the cluster initiates a state transition.

In many clusters which use a Local Area Network as a cluster interconnect, a lower bound on the practical value for RECNXINTERVAL is how long it takes for the bridges' Spanning Tree protocol to reach resolution so that the bridges can again begin forwarding packets. In my experience, this tends to be somewhere around 35-40 seconds with default Spanning Tree parameters in the bridges (and the default value of RECNXINTERVAL, which is 20 seconds, is thus too low for this case.)

Folks sometimes get around this lower limit by configuring multiple independent (not bridged together) LANs configured such that a Spanning Tree reconfiguration will not be likely to occur on both LANs at once. Another way around is to lower the Spanning Tree timers in the bridges. Or one could use bridges which implement the new IEEE 802.1w (Rapid Reconfiguration) algorithm for the Spanning Tree, supplementing the older 802.1d standard.

(I'm sorry, but I don't see how RECNXINTERVAL relates in any way to the inter-site latency. By the way, a handy way to measure the inter-site latency is the LOCKTIME tool from the [KP_LOCKTOOLS] directory within the V6 Freeware.)