Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

A Cluster that isn't?

Peter Quodling
Trusted Contributor

A Cluster that isn't?

I have an interesting quandary. I am prepping a DR Machine, for our production VMS Box. The infinite wisdom folks, have decided that it is better to do things like "have all of the storage handled by a HDS-backed SAN" but I have at least managed to keep the system disks for the DR Machine locally attached.

Until such time as we get production to DR Site replication of the HDS Storage (don't hold breath). I need to use Legato Networker for backup and restore (another strategic decision) from production to DR (via an intervening tape robot (STK L700) As such, I really need a barebones VMS 7.3-2, with appropriate IP Setup, SAN bits, and Legato bits) in order to be able to trigger the restores.

Data Disks are not an issue, the challenge comes when restoring the system disk, as "Networker" does file backups only, not impage, and as such will not create a bootable disk.

What is mulling in my mind, is that we set up the above as one cluster system root(say sys0.) and then restore the production system disk (Legato) backup to (sys1.)

Then Reboot from sys1 instead of sys0. meaning that we are booting from the restored (non image) copy of the production system, rather than the minimum OS.

Questions.

1. Do I then need to writeboot to allow booting from sys1? My gut feel is no, as writeboot deals with vms$common.sysexe] apb.exe

2. Do I need to turn on Clustering to do this?

3. TCPIP files are all over the place between Sys$sysroot: and sys$common: - I guess I can work that out (THis place also insists on hard-coded Ip's rather than DNS's for failover... )

What I plan to end up with is a single node cluster, where one machine can boot off more than one root. Does this make sense?

Q
Leave the Money on the Fridge.
9 REPLIES
Willem Grooters
Honored Contributor

Re: A Cluster that isn't?

I'm not too experienced here, but just for what I do understand from courses and own system:

1. No need indeed for your intented configuration.
2. Probably, to be able to create [SYS1] (using Cluster_Config.com) but it might be you can do without. However, I think it's a better idea to create a one-node cluster. Startup may be slower (since it won't find another member within timeout).
3. Most Services have their own directory, located in either SYS$SYSDEVICE or SYS$SYSROOT. I expect they all have the ability to define a logical TCPIP$_ROOT to point to any location. That would open facility to locate these service directory anywhere - even outside the systemdisk.
For the TCPIP*.DAT files - I founds some in [SYS0.SYS$STARTUP] and [SYS0.SYSCOMMON.SYSEXE], making them node-specific (that makes sense). In a non-clustered system, I found the latter ones in [VMS$COMMON] as well but I expect that in a clustered systm it may be a different case (I have no access to a clusteerd machine at the moment).
Using hard-coded IP's isn't a bad idea, you could think of sharing TCPIP$HOSTS.DAT (on the VMS boxes).

If you succeeed in setting up this one-node cluster, it indeed would make sense.

Just another consideration: Is it possible to add a second (smaller) disk in that node and shadow this to the normnal system disk? Just boot from another disk.

(Weird. can't you create an image backup using Legato?)
Willem Grooters
OpenVMS Developer & System Manager
Volker Halle
Honored Contributor

Re: A Cluster that isn't?

Peter,

I would expect most problems in this approach to come from the OpenVMS system disk structure (rooted directories). A file backup probably does not understand this and will get confused when restoring the directory trees.

Answers:

1. My gut feel is the same as yours ;-)

2. no, you can boot a standalone system from any valid root on the system disk. Does not need VAXCLUSTER .ne. 0

3. The problems may start with TCPIP files in SYS$COMMON. The configuration file is in SYS$COMMON:[SYSEXE] - valid for all roots on that disk - SCSNODE is the key (e.g. if you change SCSNODE of your root, you loose most of your TCPIP config information). Would the restore of the backup then overwrite that file (with the version of your production node) or skip that file, as it already exists ? In either case, your TCPIP config is not consistent anymore.

Volker.
Robert Gezelter
Honored Contributor

Re: A Cluster that isn't?

Peter,

An OpenVMS system disk, clustered or standalone, contains a several directories which are aliased (they appear at multiple points in the directory structure).

Booting off of an alternate system root does not exclude the use of this structure, so booting from the same system device with a different root does not get very much in terms of not using the other root's files, since the vast majority of files are shared between the roots. The exception is the classic standalone backup. Files which are indexed by nodename, such as the previously mentioned TCPIP configuration files, are another problem.

Restoring these files out from under a running system can produce unpredicatable results.

However, you do have the germ of a useable idea here. You can accomplish what you want to do by using an alternate system disk, doing the base restores, and then rebooting from the restored system disk. There are also other alternatives, some of which I have implemented for clients.

I hope that the above is helpful.

- Bob Gezelter, http://www.rlgsc.com
Jan van den Ende
Honored Contributor

Re: A Cluster that isn't?

Peter,

well, in my view your problem still raises more questions than answers.

Let me start with summing my _ASSUMPTIONS_ from your text.
Firstly, since you ARE building a DR site, my guess is that cost may be an interesting aspect, but not the biggest bottleneck, right?
Then, you have ANOTHER, equivalent system at your DR site, do you?
Your users can access your DR location as well as your production location from their work location? Or do you have a DR site for 'vital' workers as well?

Quite important: what is the distance between your Production site and your DR site? What connection do you have / can you get between those sites?
What is your exact method of replication? Lagato backup to an intervening robot, and then? Does the same robot also access your DR site, (how), or are the tapes carried there?
If any of these answers are not yet fully fixed (which I hope for some!), then indicate the 'room to move', please.

From these answers, I will know if we it can be bent in a direction that we can export our expiriences to you.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Willem Grooters
Honored Contributor

Re: A Cluster that isn't?

AFAIK these network configuration data (be it DECnet or TCPIP) should be node-specific, not generic for all nodes. Otherwise you would have severe trouble when booting satellite nodes. Thses share the same physical system disk (S$S$SYSDEVICE), but have different locations they boot from (SYS$SYSROOT = SYS$SYSDEVICE:[SYSx]). If this data would reside on SYS$COMMON, it would mean ache node has the same SCSNODE, DECNet and TCPIP-name and -address. Doesn't seem too healthy for your network, I guess?

Willem
Willem Grooters
OpenVMS Developer & System Manager
Peter Quodling
Trusted Contributor

Re: A Cluster that isn't?

re Willem.

2. Startup shouldn't be slower, because of timeout - expected votes should be one, so it won't be looking for other nodes.

re having second smaller disk. Ultimately the system disk is a hbs pair of 73 GB disks (on a DS25) We could boot from disk A, restore to disk B, writeboot, boot from disk b, and then mount disk a as a shadow to disk b, but that seems less elegant.

RE legato and immage backups - yup according to our legato team, and everything ( have been able to find in the documentation.

re Volker
.3 Yup, anticipate some research needed on IP settings.

re Gezelter - would be interested on your other alternatives. VMS Development, as you may know, has used an extra layer of rooted logicals (known as the folk disk, or CLU$common) in many of their internal configs to easy the pain of OS Changes - that is what triggered this line of thought.

Re Jan van ...

Production is a Dual 533 4100, with KZPAC and 3 9GB's mirrored, and a 3x2x18.2Gb 0+1 as the "data" disk. DR is a dual CPU 1 Ghz DS25, with two 73Gb's direct attached, and Fibre channel to connect to a HDS SAN (still unproven...) Backup and restore is via a STK L700, with LTO-2's.

Generally you are preaching to the converted - I regard the constraints on this as making it an absolute dogs breakfast. We actually have the potential to acquire a pair of ES40's and HSG80 based storage. I regard dual siting that, with a split cluster, as making much more sense. But, the planners of this appear to come from PC/Unix worlds, and have been swayed by the likes of the storage vendors...


q





Leave the Money on the Fridge.
Jan van den Ende
Honored Contributor

Re: A Cluster that isn't?

Peter,

so, your PRD and DR sites are connected (by a connection that supports/can be made to support SCS)?
Your disks are 'mirrored'? As in, controller based mirrored, or, Host based 'shadowed'?

Well, what better arguments to give them then to lead them to our 'Uptime' story.

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=855602

Some of the technical details will require following some links, but that should pose no problems.
If you have them follow those links, they also contain some real-life reports of actual critical situations.

Of course I do not know your management, but ours was actually most sensitive for the slogan:

"We do NOT want Disaster Recovery, because we can offer Disaster Resilience."

The easiest way to recover from disaster after a crisis is to prevent the crisis from turning into a disaster!

My advise in short:
ONE multisite cluster, with multisite HostBasedShadowing. (and definitely NO replication!)

Ask those storage vendors some details about the failover. Probably goes rather smooth.

Then, ask them about failBACK. Let them guarantee in writing that THAT goes equally smooth, and equally fast!

I have SEEN (luckily not for VMS!) that the failover took 2 minutes, and everything was running again. Then the failback took 18 (eightteen!) HOURS, during which the applications couls NOT be used. And after that about three quart of the applications functioned correctly, and it still took several hours to get the remainder available.

At the very least, EXCERSIZE before going life!

My offer stands: if you are doing a 'decent' VMS-style solution, I will be available for a lot of advise from experience.

Then again: there are more ways to skin a cat, and if you (have to) choose another way, I still wish you success, and I will help as much as I can (which will of course be less).

I think my first wish will have to be for success in the fight about the choice!
Permission to use anything I have put into ITRC explicitly granted.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Robert_Boyd
Respected Contributor

Re: A Cluster that isn't?

Peter,
1st off: I agree with the preference of a multi-site cluster/storage architecture -- it's just so much cleaner -- and if lost time is lost money, then it makes perfect sense.

If however you are forced to live with the current scenario of the Legato backups, you might save yourself some major headaches by doing this:

Create a place on your disk environment to make a saveset file of an image backup of your system disk and let Legato back it up from there. Then when you are doing your restore, you can unpack the saveset and do an image restore from it. This may seem convoluted, but the benefit of this approach would be that you don't have to worry about the directory structure being handled correctly.

Aside/Rant: What good is a backup product that won't restore the source disk to the exact state it was in when the backup was made? Yes you can restore individual files -- and this is goodness. In the *n*x environments they have to deal with links -- so why not in OpenVMS?

Robert
Master you were right about 1 thing -- the negotiations were SHORT!
Volker Halle
Honored Contributor

Re: A Cluster that isn't?

Peter,

it would really be nice to have a bootable CD with a valid IP config and a networker client. This should then allow you to restore your system disk, boot from it and do the rest of the restores.

Volker.