Operating System - OpenVMS
Showing results for 
Search instead for 
Did you mean: 

Using EoMPLS for multi-site cluster interconnect?

Dana Conroy

Using EoMPLS for multi-site cluster interconnect?

Greetings all,

We are working on the design of a new multi-site cluster which will replace our existing single-site cluster.

From our network folks, we requested a dedicated, redundant GbE inter-site link. However, they are now asking us to consider use of Ethernet over MPLS (EoMPLS) for SCS and DECnet communication rather than the dedicated links.

I've reviewed earlier multi-site clustering discussions here in the ITRC forum and in Usenet, as well as the HP cluster manuals, the cluster SPD and Keith Parris' presentations - all good stuff.

Based on what I've read thus far, I would much rather proceed with the dedicated link, but I'm interested in any feedback (pro and con) regarding EoMPLS.

Thank you,
Colin Butcher
Esteemed Contributor

Re: Using EoMPLS for multi-site cluster interconnect?

You might want to think about also separating DECnet traffic and SCS traffic. SCS as a layer 2 LAN protocol requires low latency, low jitter (low variation in latency) and high bandwidth (depending on overall workload). DECnet is designed as a WAN protocol, so will better cope with variations in network behaviour. Split them over different adapters if you can. Two for SCS and two for DECnet (Phase V will do load balancing over all available paths for the price of an end-node licence). Ensure that the paths are physically separate. You could also use DECnet over IP with TCP/IP as the pseudo-transport layer.

The "best solution" for SCS, DECnet, TCp/IP (I assume you're also using that) and fibrechannel (I also assume that you're using FC based storage) depends on the scale and workload of your proposed cluster, but in general you want to get the lowest possible end to end latency and a minimum guaranteed bandwidth between the sites. I'm all for minimising complexity, so a direct fibre path with nothing else adding latencies is usually a good starting point. Using DWDM to multiplex GigE ethernets over two physically separate fibres between sites can be a useful technique.

EoMPLS implementations vary depending on the vendor's equipment and the available physical paths - what are the guaranteed absolute maximum end to end latency and the minimum end to end guaranteed bandwidth? Are there any single points of failure in the design?

Remember to design for worst-case, not steady-state running. What happens when a node or a site fails and the network taffic peaks dramatically? What happens when a network path fails during a period of high inter-site traffic? I've yet to meet a really interesting systems problem that isn't a mix of peak load and latency issues.

If you attend the bootcamp (see http://www.hp.com/go/openvms/bootcamp) then you'll be able to discuss this with a lot of good people. Maybe see you there.

You might also want to talk with HP's DTCS group who specialise in building this kind of thing for customers.

Hope this helps.

Cheers, Colin.
Entia non sunt multiplicanda praeter necessitatem (Occam's razor).
Robert Gezelter
Honored Contributor

Re: Using EoMPLS for multi-site cluster interconnect?


In addition to what Colin mentioned, there is a need to both monitor the redundant connections, and ensure that the "physically diverse routings" remain diverse over time.

I have seen many cases of redundant networks (and systems), in a way, as a direct result of their redundancy.

The redundancy lead to operational complacency, which led to failure. An unmonitored redundant system or network becomes non-redundant when the number of failures is one short of it becomming a system with a single point of failure.

The most common case of the preceding is the "dual redundant network" with one cable unplugged. It keeps on running, but it is no longer redundant I also encountered a client whose system was running with a failed system disk -- luckily one member of a two member shadow set. The local operator (there was not really a system manager) thought everything was fine, the system was running. There was just this noise everytime he restarted the system (Yes, that was the "Disk Failed" alarm from the RAID system restarted. Luckilly, the remaining disk did not fail before I was able to obtain a replacement drive.

The moral of this somewhat long-winded tale is simple: Design for redundancy, but ALWAYS remember that ongoing monitoring by EVERYONE who manages part of the path is critical.

- Bob Gezelter, http://www.rlgsc.com
Honored Contributor

Re: Using EoMPLS for multi-site cluster interconnect?

Hi Dana,
just for talking without direct experience of cluster over MPLS. I'm using MPLS technology for other kind of applications.

MPLS may be an interesting unknown land for cluster.
As far as I known, the most important element in the chain becomes the MPLS provider. Here in Italy come MPLS provider can offer service with limited latency and I guess this is also possible in USA. Don't forget, with MPSL technology you can keep a minimal bandwidth on. And you can assign higher priority to link sockets than ordinary decnet sockets.
However, no any provider can promise 24x7 service. You must design sistem for interruptable connection.

Antonio Maria Vigliotti
Dana Conroy

Re: Using EoMPLS for multi-site cluster interconnect?

Thanks to all for the replies. We will being testing the EoMPLS configuration soon. If our testing efforts yield undesirable results, we will implement a dedicated link.
Wim Van den Wyngaert
Honored Contributor

Re: Using EoMPLS for multi-site cluster interconnect?

For those who wonder what MPLS is :

I did.