Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Split Cluster Design

 
Peter Quodling
Trusted Contributor

Split Cluster Design

We have a fairly critical 24x7 system, that has been rather unhealthy and I have been nursing back to better health. At the same time, sales have committed uptime and DR capabilities to clients (Banks). so now there is a push on to implement DR.

Phase 1 of the DR is back up to STK L700/LTO2's at remote site, and then restore to DR Machine. Apart from the fact that the "new backups" using Legato Network, a) don't create bootable images, and b) take around twice as long as local backups to our TZ88's, this is half way workable.

Phase II is to attach the Primary system to a SAN (HDS) and the remote DR System to a SAN (HDS) and use some HDS Smoke and Mirrors (TRU COpy or somesuch) to keep the two in Sync. Having had some previous experience with EMC SRDF systems, that claimed to keep things synced, but in fact were battling I/O backlogs, I am nervous/cautious about such things.

Well imaging my surprise reading the HDS VMS documentation last night, when I notice that they recommend "for high availability requirements" VMS Cluster Software, and VMS PAth Failover and management. I am digging further into the HDS Documentation, but this begs the question.

If I have two remote alpha's each connect to it's own (Non HP ) SAN, do I

1. Establish them as a cluster, build local disk subsystems into local HDS SAN, and then mount shadowsets in remote san.

2. Leave it to the HDS weenies to work out how to keep my data replicated.

I know that if I was talking local HSG to remote HSG there would be a difference relative to just building a "large fabric", covering both sites.

My personal preference is clustering, it takes us from a failover situation to a fault tolerant situation, keeps real management in one place, load balances and gives a performance boost (Adding the DS25 DR machine to our Production 4100 (2) 5/533.

Interested in thoughts of my erstwhile colleagues (that would be y'all) on this...

q
Leave the Money on the Fridge.
7 REPLIES
John Gillings
Honored Contributor

Re: Split Cluster Design

q,

Obvious. Use clusters. Forget transporting tapes, use shadowing to replicate the data. Have one shadow unit at each end of the wire. Use gigabit ethernet as the cluster interconnect and shadow over the fibre channel.

This gives you LOTS of interesting capabilities. For example, you could, in future build a 3rd site, shadow your data across then shutdown one of the old sites, thereby physically moving your data centre without downtime.

Make sure you're running at least V7.3-2 and all your volumes are initialised or SET VOLUMEd with /LIMIT. That allows you to upgrade your disks without downtime. Also make sure you have mini copy and host based mini merge on all shadow sets.
A crucible of informative mistakes
Jan van den Ende
Honored Contributor

Re: Split Cluster Design

Peter,

the single multi-site cluster solution is BY FAR superior.
I understand the plan is to have a multi-site SAN anyway? That would imply fibre connection, and then it must be relatively easy to reserve some fibers for the "SYSTEM BUS". (And have those under VMS system management control, functionally they are NOT to be considered network components, even if they are hardware-wise!)

I am not real fond of the idea of only one node at a site, although, I consider 3 1-node sites superior to two multi-node sites.

If you need any reference of an existing example, feel free to use
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=855602
and the various threads it points to.
If it helps you convincing them to make the (in my view) "right" choice, then I will be proud of my part therein.

Perhaps the following will also be usefull:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=724351
as a very short 'overview' of the basic idea.

... and there are the very VMS-favorable TCO white papers as well. I seem to have no pointers right here, but i will get them and add them (or some of the others will beat me to that, I am afraid :-)

hth.

I wish you (and even more: your management) much wishdom in your decisions.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Wim Van den Wyngaert
Honored Contributor

Re: Split Cluster Design

Ian Miller.
Honored Contributor

Re: Split Cluster Design

http://h71000.www7.hp.com/openvms/whitepapers/index.html

See OpenVMS high availability/disaster tolerance

and OpenVMS total cost of ownership

____________________
Purely Personal Opinion
Andy Bustamante
Honored Contributor

Re: Split Cluster Design

OpenVMS shadowing with clustering is the solution of choice for your requirements. Allowing VMS to manage shadowing takes you from disaster recovery to disaster tolerant. With a properly configured cluster you could drop a site and still have your users serviced.

For some useful information see: http://www2.openvms.org/kparris/ and http://www.geocities.com/keithparris/

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
comarow
Trusted Contributor

Re: Split Cluster Design

Wide area clustering does have distance limitations. It can run over ATM.

Lose a site, and with VMS Clustering and availability manager you reset quorum and away you go. If you don't have availability manager (you should it's free and has lots of goodies you can reset quorum in other ways).

Heck, you can manage the system at one site, and the changes are replicated at the other site.

The Guide to Cluster Management has a section on Wide Area Networks.

Make sure you use Mini Merge for shadowing so it only shadows what's necessary.
Jan van den Ende
Honored Contributor

Re: Split Cluster Design

Re Bob Comarow:


Heck, you can manage the system at one site, and the changes are replicated at the other site.


From many years of experience: this is an UNDERstatement!

Most of the time I am not even aware of WHICH specific node I happen to be logged in to, and on some functions that require specific node activity, it is usually irrelevant at which site a node resides.

I vividly remember at one time we needed to work at a specific node (temporarily withdrawn from normal service) when I realised rather late that to touch the hardware I still had some cross-town drive to do, which at that time was an unpleasant realisation.

You REALLY forget that this "one" system actually _IS_ geographically dispersed!
(until you find out that there is an electrical or network failure in some quarters, while service remains to be available from the "other" site).

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.