Operating System - OpenVMS
Showing results for 
Search instead for 
Did you mean: 

Cloning a node environment

Go to solution
Regular Advisor

Cloning a node environment


We currently have a DR server that is connected to a private network and currently runs on a SAN disk that was initially a backup copy of the production server's system disk, i.e. same applications installed. This DR server is able to mount a copy of the production server's disks through EMC's SRDF technology.

Our strategy is that come DR time, after switching this server to the production network, we'll just reboot this server using an alternate mount disk procedure and become the production server.

What I would like to do now is that to create a job on this DR server that will periodically copy some system files from the mirror production disk so that once DR comes, we'll have the production environment retained. Specifically, my concern is on the user and queue records.

So far, I have identified the following files that I will need to copy:
1. sysuaf.dat
2. rightslist.dat
3. vmsmail_profile.dat
4. vms$password_history.dat
5. qman$master.dat (should I also include the SYS$QUEUE_MANAGER.QMAN$JOURNAL?)

Are there other files that I need to copy aside from these? Also, should a direct backup/copy procedure work for these files without doing any convert procedure before I can use them come DR time?
Karl Rohwedder
Honored Contributor

Re: Cloning a node environment


to copy the live SYSUAF ...you may use CONVERT/SHARE instead of copy.

I have never tried to copy the 'queue database' to another node, but as an alternatie you may recreate it via a DCL procedure. There are procedures floating around (perhaps on freeware CD's), which dump the queue setup with all jobs, forms... and create a DCL procedure for recreation.
Another file may be relevant too: SYS$QUEUE_MANABER.QMAN$QUEUES.

regards Kalle
John Gillings
Honored Contributor

Re: Cloning a node environment


There's a lot more to the system "personality" than just the files you've mentioned. Look at the SYLOGICALS.TEMPLATE file for a list of cluster common files. There are LOTS of files, including the security and audit data bases, proxy databases, SYSMAN startup database, LMF data base etc... You should also consider which of your local startup procedures need to be copied.

Are you going to worry about checking for updates, or just blindly copy everything every time?

Forget about copying the queue manager files, they won't make any sense at all on a new node (files are referenced by FID, which won't match). If you really want to try to replicate batch jobs, you're better off using the output of SHOW QUEUE/FULL and reconstructing the entries from the listing, but think carefully about the sense of running random jobs on the new node. You can't verify idempotence by simple inspection.

Most of the files you need to copy will be open (otherwise you wouldn't need to copy them more than once, right?) so you'll need to use CONVERT/SHARE to get an clean copy at the RMS level.

Note that timing might still give you inconsistencies between associated records in different files. Consider the password in SYSUAF and VMS$PASSWORD_HISTORY, or a UIC identifier in RIGHTSLIST and the corresponding UAF entry.

My preference is to make frequent local copies of these files to maintain a "hot" backup, then copy the backups to other locations as required. Typically they only take up a few MB, so it's not an problem to keep multiple copies.
A crucible of informative mistakes
Honored Contributor

Re: Cloning a node environment


I think you should regularly switch all your users to you DR server, to really check it is functional.

A disaster tolerant Cluster is a much better solution, IMHO.

All the Disaster Recovery solutions I have heard about, in real life, have more or less failed, because a few things were forgotten, some hardware was not available/functional/at the correct software level...

With a Disaster Tolerant Cluster, you really check that the other nodes are available.
And there is no delay.

You should have a look at the case of the fire destroying the Credit Lyonnais Paris site, while the other nodes, at Suresnes, in the suburbs of Paris, where still ok, of course.

And at the other cases of Vms Clusters surviving 11/09 (except for the site having all his Vms nodes in the two towers).
Jan van den Ende
Honored Contributor

Re: Cloning a node environment


I have to second Gerard Labadie:
A disaster tolerant Cluster is a much better solution, IMHO.

Just look up the various reports about the Amsterdam Police cluster.


Have one on me.

Don't rust yours pelled jacker to fine doll missed aches.
Robert Gezelter
Honored Contributor

Re: Cloning a node environment


I would agree with the comment about managing this as a single cluster.

While I often configure alternate bootstrap roots to deal with different contingencies, my first choice for a DR situation would not be such an environment. There is too large a potential for error, and too much potential for problems when the inevitable situation occurs.

- Bob Gezelter, http://www.rlgsc.com
Hein van den Heuvel
Honored Contributor

Re: Cloning a node environment

As suggested, going to a real DR cluster setup may be the best alternative.

If the main page & swap files are not on the system disk, then that system disk can be relatively small, and will have a low change rate. Why not create a second copy and have EMC SRDF/Async maintain it 'best effort'. I'm suggesting a second copy to make 100% sure there is a bootable disk out there. Just imagine that the main site went down due to a catastrophic data overwrite/delete on the system disk. 'Bad day'.

You can get a clean copy for a lot of the files you mention with Convert/share as Kalle suggests.

But moreoever, most of those can be moved away from the system disk with (system) logical names. So just place them on one of the remote-cloned data disks and be happy?
You could still add a daily $convert/share to the system disk 'just in case'.

Honored Contributor

Re: Cloning a node environment

Not counting your data and your database and other such, the list of cluster-shared files for OpenVMS itself is listed in the file SYLOGICALS.TEMPLATE on OpenVMS V7.2 and later.

Most of the layered products -- the TCP/IP Services database files comes to mind here, but many other products have similar requirements -- would also need their context replicated as and when changes are made.

There are also "out-board" issues that can come into play, such as the occasional changes made to MODPARAMS.DAT. These too would need to be shadowed.

I'm professionally rather skeptical around any controller-level active-active data replication product -- from any vendor. This based on knowledge of how much engineering effort went into clustering and shadowing to get that to work as well as it does within OpenVMS itself. Of those storage controllers I've worked with over the years that have claimed an ability to replicate active-active clustering configurations at the level of a storage controller, all such products have shared a single common feature. They failed.

Active-passive is far easier to manage, but there are still the usual issues akin to splitting a shadowset or performing an on-line BACKUP. Put simply, is the resulting data snapshot entirely consistent?

A full DTCS cluster is "somewhat easier" to manage, and you can use some or all of the processing power in some or all of the lobes as part of your production.

In any event, do fully and then regularly test your fail-over. An untested DR environment is often analogous to no DR.
Regular Advisor

Re: Cloning a node environment

Folks, as always, thank you for your valuable insights and feedbacks!

We will ponder upon these suggestions and present to our client, if needed.

Again, many thanks to you all.