Operating System - OpenVMS
Showing results for 
Search instead for 
Did you mean: 

Cluster over different networks

Occasional Collector

Cluster over different networks



Currently have 5 VAX systems clustered on one network segment. The main VAX V1 is a disk server and the other 4 are satalites from this node (V2 to V5).  V1, V2 and V3 have two network interface, one connected to S1 and the other to S2.

What I would like to do is have V4 connected to S2 (rather than S1 as currently), However when I try this the VAX boots to the stage of 'Waiting to create or join vax cluster'. Then nothing more.

Is this configuration possible and do I need to do something to enable cluster traffic on S2?

Richard Brodie_1
Honored Contributor

Re: Cluster over different networks

It would help if you specified the VMS version; the rules (and tools) change quite a bit.


The thing that concerns me is the box that you are not talking about. Where is V5 in all this?


After ''Waiting to create or join vax cluster' , you might get some more information after a delay of 2? minutes or so (just in case you didn't wait around to see).

Steven Schweda
Honored Contributor

Re: Cluster over different networks

> [...] one connected to S1 and the other to S2.

   My psychic powers are too weak to tell me what "S1" and "S2" are, or
what they have to do with each other.  Perhaps you could provide some
useful information.

   Hint: If these are different network segments connected by an IP
router, then the non-IP cluster traffic may have some difficulty getting
between "S1" and "S2".

Honored Contributor

Re: Cluster over different networks

A cluster requires total connectivity among its members.  


Each member node must have a direct communications path with every other node in the cluster.  


If a node cannot directly communicate with every other node, then it will not be permitted into the cluster.


Meeting this cluster connectivity requirement is trivial with broadcast networks such as Ethernet, and otherwise involves an arithmetic progression in the wiring closet.


And yes, SCS cluster traffic must be enabled on each path where cluster communications occur.


Depending on the particular VAX model or capabilities of the particular VAX emulator, there can be faster Ethernet network controllers available, too.  (Here guessing that the reason you're looking at this shuffling involves bandwidth or contention issues, and VAX boxes often tend to have ten megabit, though some can have or be upgraded to hundred megabit connections.  VAX boxes are limited in this area.)

Occasional Collector

Re: Cluster over different networks

Thanks for the replies.


The version of VMS is 5.5-2H4. I had suspected that all nodes need to communicate with each other. Just wondered if the router nodes would route the cluster traffic.


S1 and S2 are separate network segments. No connectivity other than via the VAXes exists.


The main reason for trying to move one VAX to the second segment is to help with fault finding. For some reason from time to time I get a 'response timeout' error in the ncp counters on  this VAX to the main VAX. It happens about 15 times a day. I am trying to eliminate other network traffic (although this is very small, switch reports about 5 - 10% usage at 10M/Half). I have swapped as much of the hardware as possible other than physical VAXes (cables, switch etc).

This VAX runs an application that sends data every second to the main VAX (task to task, no disk I/O). When data is lost I see the timeout value clock up. Everything else continues to work okay, cluster remains up, remote logon stay connected. The data stream continues without intervention, just missing a second or two of data.


Honored Contributor

Re: Cluster over different networks

SCS is not IP, and there are no SCS routers available. 


SCS can be bridged, and can be bridged over IP; that's been possible since bridging was first possible with Ethernet.  Since before clustering existed, in other words.


In V8.4 and later, SCS can also operate over IP.  Clustering over IP requires all hosts at V8.4 (if there's any cluster member or cluster lobe that's only accessible via IP), which means that this is not an option available for clusters with VAX systems, nor for any versions of OpenVMS Alpha or OpenVMS I64 prior to V8.4.


The requirement for total connectivity applies to all OpenVMS cluster configurations and all versions.




As for the communications problem lurking here, and not the clustering issues...


Please post the specific application timeout error text for the DECnet NCP diagnostic you are receiving.


Please zero and then check the NCP counters for errors being logged, as well.


A network load average usually won't spot spiky network traffic, either.  This can sometimes be spotted with switch mirroring, though your 10MbE network load is probably low enough that the receiver won't see a huge load of traffic while the sniffing activity.


In addition to the network load, what's the load on the two systems invokved?  That's a common trigger for bugs and errors.  


Are any hardware errors logged?  Check the OpenVMS error log for details.


While failing hardware is certainly a potential culprit here, the usual trigger for these sorts of errors involves latent programming bugs.  Errant $qio and $qiow calls are very common on code in this vintage.  Missing event flags, missing IOSBs, incorrectly allocated IOSBs, etc.  (I've seen these sorts of timing and coding bugs lurking for a decade or more, too, until something perturbs the code and exposes the latent bug.  Code that appears to work is not necessarily correct code.)  Here's a list of common coding errors.


And FWIW,  it helps if you lead with the actual problem and related background, and not with a discussion proposed workaround or an error with the proposed workaround.  You might well receive the right answer to your proposed workaround but - if it's not the right solution for the problem - with little or no forward progress on the problem.


Bob Blunt
Respected Contributor

Re: Cluster over different networks

Given your VMS version I'd say that your cluster will be better served if all nodes are on the same segment and, if possible, on the same switch or (depending on how your switch is designed) on the same "line card" or "blade."  As Hoff said you can bridge but all the nodes have to be able to talk directly with each other on the same network.  I've never had a problem as long as I followed those rules but have been warned before that cascading switches or connecting to different line cards/blades in the same chassis can introduce enough timing issues that problems can be more prevalent.