Re: Clustering between Data Centers.

The Brit · ‎01-07-2008

Hi folks,
This is just in the way of a reqest for comments. We are investigating extending our OpenVMS Cluster from a single DC to dual Data Centers. The separation is ~3 miles (~5km) and our connection is an OC-48 with a private wavelength, bandwidth will be ~2.5Gb/s. We are intending to use Ciena CN2000 units at each end, (as SAN extenders) and with 6x compression engaged.
We are currently running OpenVMS 7.3-2, (fully patched up to 15 Dec 2007), TCPWare IP stack. The Alpha's are ES40/45's and the storage subsystems are EVA8000(3) and XP10000(1).
Our initial set up will involve placing a (NEW)"Warm" Alpha and an EVA8000 in DC2. The Alpha will be a cluster member and will have all of the Production ShadowSets mounted. The EVA8000 will host one unit from each production Shadow Set.
Initially, our current cluster (3-nodes) and our main Application will continue to run in DC1. Also, for the moment, the Alpha in DC2 will only be used for Disaster recovery, i.e. it will not be running any of the production applications, and will only be used as a failover node in the event of a disaster, (At which time the Apps would be started.

We would be interested in hearing from anyone who is doing anything similar, i.e. Node separation ~ Campus < 5km < Metro. Inparticular, interested in any observed latency/performance issues, or "gotcha's"

Appreciate it.

Dave.

Hoff · ‎01-07-2008

At 5 Km, you won't see appreciable round-trip latency unless your provider has extremely circuitous OC-48 routing. I'm familiar with a three-lobed cluster on SONET OC-3 over a rather longer distance, and the three-lobe cluster all ran like it was local.

What I would look at is how long it would take to replicate however much data you are shadowing over that OC-48 link. The available bandwidth (and you won't get the whole of OC-48 here due to HBVS overhead and other activity, and whatever benefits data compression might provide) and the quantity will give you time for the full HBVS member copy. (And I'm assuming HBVS here when you point to shadowing, and not controller-level mirroring.)

You'll want to look at fail-over processing and procedures and communications, and how you're going to manage IP networking and such in addition to the bridged SCS traffic. Most of what I've seen go wrong here has had to do with problems secondary to the failure of a data center and/or of communications links. With process failures, untested procedures, and human errors.

Some of the usual Keith Parris presentation pointers:

http://www2.openvms.org/kparris/hptf2005_LongDistanceVMSclusters.ppt
http://www2.openvms.org/kparris/bootcamp_cluster_internals.ppt

Stephen Hoffman
HoffmanLabs LLC

Robert Gezelter · ‎01-07-2008

Dave,

I agree with Hoff, and will amplify some additional points.

First, when working with clients on similar situations, I always recommend full, pre-planned contingency configurations. When the [fur, feathers, scales, or leaves; depending on your genus] start flying, it is not the time to make edits to command files and parameter files.

I try to pre-configure boot roots on the system disks for contingencies, including alter-egos for production nodes. Think role, not hardware. This way, reconfiguring to deal with a casualty is a matter of selecting an entry in a pre-defined matrix, not altering things on the fly.

If you are using a quorum disk, I recommend pre-configured roots for using an alternate quorum disk on the other site.

The sum total of the above is that you will chose which root to boot from, and it is far easier to specify that over the phone than doing a conversational boot.

Needless to say, parameterizing things in terms of logical names is also a very beneficial activity. My paper in the February 2004 OpenVMS Technical Journal, "Inheritance Based Environments in Stand-alone OpenVMS Systems and OpenVMS Clusters" (see http://www.rlgsc.com/publications/vmstechjournal/inheritance.html ) was inspired by a client-situation similar to the one that is described in this thread.

I would also recommend serious consideration of a triplet of DS-class system split between the two sites for testing and experimentation.

I hope that the above is helpful.

- Bob Gezelter, http://www.rlgsc.com

Wim Van den Wyngaert · ‎01-07-2008

Also check set dev dsax /site= (to be done differently on each cluster node).

This to avoid reading in the other building.

We do it at mount time. And we have a 5 km FDDI cluster (2 sans) but performance is not a problem. There are however many application issues (testing nodes names in scripts, connection openings being slow, double paths to external partners, ... ).

Wim

Wim

Jan van den Ende · ‎01-08-2008

Dave,

I am with Hoff on this.

Compare 5 KM distance with "at one site".
An IO might normally take 4 communications round-trips. (for locking purposes).
So, 20 KM extra latency. Approx glass diffraction index as 1.5, so speed of light becomes 200 000 KM/sec.
Added latence: 0.1 millisecs.
In other words, negligible compared to true local IO.
(Wim: that is why the SITE setting becomes relevant only at greater distances, or very low thoughput IO connections).

So, except for the obvious concern about intersite connections, this might be considered as "nearly" one site....

About that concern: (as per the teachings of Tom Speake), try to convince your management that the intersite link effectively _IS_ an extended _SYSTEM BUS_.
Therefore, it should be under _YOUR_ control, or at least, YOU should have a heavy voice in configuring and managing.
(the way of looking at availibility related issues tends to be rather more strict for VMS managers and more leniant for, eg, windows managers or network managers).

>>>
Initially, the new site will be a failover site
<<<

Depending on your application, that might be prudent, or it might just complicate things.

_IF_ your app(s) is/are cluster aware or cluster transparent (as are RMS, Rdb, DBMS applications) THEN it is easiest (and safest) to just start the app on all nodes.
If the app relies on a Unix-type Database engine, ie ONE database engine per cluster, which interfaces with all front end processes, and funnels ALL IO to the database, then you already have some failover scheme, and you just can extend it to include more nodes.

In my (VMS colored?) view, a failover configuration is just a poor-man's (poor-OS's, poor-DB's) substitute for full (VMS-style) clustering... :-)

So, probably your biggest concern here is managing end-user connectivity!
Both in case of a failover, as in case of a balanced distributed workload.

And, whatever solution you will implement:
_THINK_ 3 times before you act.
Work out every possible /e/l/b/i/s/s/o/p (unattainable) thinkable non-perfect, up to disastrous, mode of operation, and work out, beforehand, a recover scenario.

---and DO have regular error-situation drills!!

Success.

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Wim Van den Wyngaert · ‎01-08-2008

Jan,

We have a 100 Mbit interbuilding link. This limits the thruput to about 10 Mbytes/sec. Thus avoiding interbuilding reads is a plus to increase the thruput (and thus the speed when aproaching maximum thruput).

But indeed, the speed itself for both operations is about the same (just did a test and got 1% better wall time for local reads, I had expected a higher percentage).

Wim

Wim

Colin Butcher · ‎01-08-2008

Hello Dave,

Lots of good stuff already, including in other threads (eg: Geni's question on Disater Tolerance).

I'll add one thing to consider carefully:

Compression - beware of the effect of compressing already compressed data (ZIP files, PCSI$COMPRESSED files, etc.) as often the compression algorithms will actually increase the amount of data to be transferred.

Also note that as of V8.3 PEDRIVER is now capable of data compression, enabled in SCACP (see the V8.3 release notes). It might be worth your while upgrading, along with some of the performance improvements in V8.3 with multi-processor machines and fastpath IO devices.

Think about minimising the inter-site link traffic (local booting for example) and how you'll handle the detection and automation of failure recovery under different scenarios. You probably don't want to automate any of the decision making unless yuo can guarantee that you can think of and can test every possible scenario.

Go with your own fibre if you can - layer 2 low (and consistent) latency is what you need. You probably don't want to be vulnerable to someone else's network routing and the consequences of their routes changing. You also want to make sure that you have genuinely dual-paths between the sites and that your suppliers don't buy bandwidth from each other and it's all going over the same physical paths for most of the way!

Cheers, Colin (http://www.xdelta.co.uk).

Entia non sunt multiplicanda praeter necessitatem (Occam's razor).

The Brit · ‎01-09-2008

Hi Guys,
Thanks very much for your comments, they were pretty much in line with what we were already thinking. It is always useful to get independent opinions, they often trigger a forgotten memory, or raise a flag that needs investigating.
As a further reassurance, we talked the configuration over with the "King" of clustering (no need for any other identification), and everything seems to be pretty positive.

Again thanks for allowing me access to your thought processes.

Dave.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Clustering between Data Centers.

Clustering between Data Centers.