Re: Turning a standalone node into a two-node VMScluster with MSA1000

Hoff · ‎11-24-2009

Jan, I'll keep your suggestion that the system manager learn about DCL and logical names under advisement.

Your servers likely have a different mix of layered products than the boxes under test here, and with your quarter-century of experience. The layered products are particularly where I am having "fun" here, and not with the site-specific DCL procedures that I've coded up.

I'm testing the system disk and host rename sequences, among other things; a version of the documentation that is listed in the http://labs.hoffmanlabs.com/node/589 article that was mentioned earlier. The sort of tasks that would be required to perform a disaster-restart, for instance, when the server hardware configuration for the backup server does not specifically match the failed server. Or when you're testing a copy of your system disk with an ECO kit, or with a new OpenVMS release.

Here's an example of the "fun" I'm encountering; of a DCL change (;2) I've implemented in one of the layered products (;1) that was installed on the box. I'd have coded this procedure differently, but then the following "minor" update to the original DCL is rather more tolerant of host and device name changes.

************
File SYS$COMMON:[SYS$STARTUP]IDE$STARTUP.COM;2
2 $!
3 $ ddcu = f$trnlnm("SYS$SYSDEVICE")
4 $ define/system/exec/trans=(conc, term) IDE$ROOT 'ddcu'[SYS0.SYSCOMMON.]
5 $ define/system/exec/trans=(conc, term) IDE$JARS_ROOT 'ddcu'[SYS0.SYSCOMMON.IDE$SERVER.JARS.]
6 $ define/system/exec/trans=(conc, term) IDE$ANT_ROOT 'ddcu'[SYS0.SYSCOMMON.IDE$SERVER.ANT.]
7 $ @IDE$ROOT:[IDE$SERVER.COM]IDE$IDESTARTUP.COM
8 $ exit
******
File SYS$COMMON:[SYS$STARTUP]IDE$STARTUP.COM;1
2 $! Version 1.0
3 $! This command procedure was generated during installation
4 $! on 1-JUN-2009 17:08:35.48
5 $ define/system/exec/trans=(conc, term) IDE$ROOT VMS$DKB500:[SYS0.SYSCOMMON.]
6 $ define/system/exec/trans=(conc, term) IDE$JARS_ROOT VMS$DKB500:[SYS0.SYSCOMMON.IDE$SERVER.JARS.]
7 $ define/system/exec/trans=(conc, term) IDE$ANT_ROOT VMS$DKB500:[SYS0.SYSCOMMON.IDE$SERVER.ANT.]
8 $ @VMS$DKB500:[SYS0.SYSCOMMON.][IDE$SERVER.COM]IDE$IDESTARTUP.COM
9 $ exit
************

Jeremy Begg · ‎11-24-2009

Hi all, and thanks for your comments.

I agree a common system disk is simpler to manage but after talking with a friendly local OpenVMS Ambassador I'm very comfortable with my decision not to do it that way.

Hoff's comments about device naming and system startup procedures certainly struck a chord, it's an issue I've faced several times before. In fact I'm rather surprised by Jan's comments; it's my experience that many HP software installs, as well as a few 3rd-party products, are always hard-coding the system disk or installation disk. Why they can't use 'F$TRNLNM("SYS$SYSDEVICE")' or parse out the installation directory from F$GETENV("PROCEEDURE") I'll never know! (OK that's not a panacea, but it works 99% of the time.)

So I think I'm all set with this for now. The "Big Day" is Saturday.

Regards,
Jeremy Begg

Robert Gezelter · ‎11-25-2009

Jeremy,

My wishes for the best of luck that things go smoothly! [The preceding should not be taken to imply that good luck is a substitute for planning, but rather that bad luck can be particularly malign.]

On the system disk question, consider that while the system disks can be separate, that should be a question of load and placement, not a question of contents. By this I mean that the contents of the disks should be identical, down to complete copies of the various SYSn directories, and that SYSn identities should not be reused on different nodes. The omission of SYS0 is also good, in that it prevents a virgin hardware node (or a node whose NVRAM has been somehow wiped) from booting naively into the cluster using an already active root number.

Such a choice makes the choice of shared/non-shared system disk merely a choice of the moment, and where the file is placed. It also means that it is a choice that can easily be re-visited at a later time, without significant repercussions.

- Bob Gezelter, http://www.rlgsc.com

Jan van den Ende · ‎11-25-2009

Re Hoff:

You are (of course) entirely right about most (even HP-supplied) layered product installation procedures (and I tend to think that it certainly NOT got better since Digital times).
However,
both VMSINSTAL and PCSI _DO_ have the option of NOT choosing SYS$SYSDEVICE, but a user-supplied location.
I always make sure to prepare a suitable Concealed Device (and DEFINE that early in the bootstrap sequence) and specify it to VMSINSTAL/PCSI.
And lo & behold: all that ever needs to be changed when changing hardware is the correct definition of the concealed device.
(and to further simplify any future process of changes, on sites I set up, ALL (and only) concealed devices are defined in ONE .COM file; which is @-ed from SYLOGICALS.COM)

-- it is just one more way to exploit the versatility of VMS :-)

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Hoff · ‎11-25-2009

>- it is just one more way to exploit the versatility of VMS

That's one view.

In another view, that approach is how OpenVMS gets a reputation for cryptic and arcane and obscure, and that is also how testing installation environments gets even hairier.

I've written and debugged code that deals with somebody passing in a concealed rooted logical name or a concealed searchlist; that gets ugly. And more than a few folks don't get the logical name correctly defined, which means you're dealing with that path when supporting stuff. And you still end up fixing the device references; PCSI got this stuff basically right, and you still need to re-register volumes when you relocate files. And how many tools implement that option?

I'm increasingly taking the strategy of removing cases of unnecessary versatility from designs.

Jon Pinkley · ‎11-27-2009

Jeremy,

Several questions:

Are the existing internal disks in the DS25 systems compatible with the MSA1000? If so I would strongly consider moving as much of your disk storage as possible to the MSA1000. Shared storage is much better than MSCP served storage. The only disadvantage is that the MSA becomes a single point of failure, but for all practical purposes, it will be a single point of failure anyway.

Do you have shadowing licenses? If not, consider buying them. I haven't looked at the pricing lately, but it used to be much cheaper for an entry-level server like the DS25, and quite a bit cheaper than the cluster license (which you still need). Shadowing has many advantages.

I would try to make your two systems as similar as possible.

Therefore, even if you are going to have multiple system disks, I would encourage you to do as much as possible before making the clone, and then the only thing that would be different on the two system disks (until they diverge) will be the disk label.

Do consider Bob Gezelter's advice about starting with a single node cluster, and moving all of your cluster common files to the MSA, creating a quorum disk on the MSA, and after you think you have that working, give the quorum disk 1 vote, the DS25 1 vote and change expected votes to three. You should still be able to boot with a single node. If you can't, determine why before going forward. After you have a working cluster with expected votes of three, then add then new system root. (Consider using the node's alloclass as the root #, since CI isn't involved, you are not limited to a single hex digit for the root. Using the allocation class for the root makes it easy to keep track, and avoids using the sys0 root.)

I just tried renaming a system root this week on a test system. Changing the root of an existing node is as easy as booting from another root or the distribution disk and renaming the [000000]sys0.dir to something else, like [000000]sys20.dir, and then changing the boot_osflags. This assumes that there aren't any of the poorly coded products that use things like Hoff's netbeans example. I'm with Jan on this one; if things are set up correctly, moving to another physical device should be nearly transparent. I don't have netbeans installed, and it appears its installation procedure was to blame for creating the startup command procedure incorrectly, but in general I have had no problems with moving our system to another and booting with very few changes, primarily a mount_disks.com was what I has to change when we installed at the disaster recovery site on our practice run. That was for the non-system disks. Everything else uses logical names based on the volume labels.

See attachment for more about the netbeans installation, and some additional things that can be done to make your procedures less dependent on disk or root names.

After you have created a second root on the first system disk, then reboot the first machine with the second system root. Get things working so you can boot the system as either node. Then when that works, you will need to think about your other installed software. Most of it was probably installed to sys$common:[sysexe], and that is no longer unique if you have more than a single system disk. If you keep your system disks "identical", then you can just clone the disk when products are upgraded. Otherwise, you will ether need to reinstall or copy stuff and fix the startup procedures. If you have the distributions, removing and reinstalling will probably be the easiest route. This is where a lot of complexity starts when you have multiple system disks.

After you have everything working, then make a clone of the disk, change the label, and take it to the other box. Leave the first node down, and try booting both system roots on the second DS25. When that works, then boot the first node again, and you should have a working cluster.

Then if things don't work, you at least know it must be related to cluster related sysgen settings, or possibly the lack of a working network connection between the nodes.

But if I were you, I would just copy the system disk to the MSA1000, and boot both nodes from it. You will need to do a bit more work (using wwidmgr at a minimum, I have no experience with the MSA1000, so how hard that is to set up to boot from, I don't know.) But if you have both system roots on the system disk that you will clone, what device you boot from shouldn't make a lot of difference (assuming you have used logical names like sys$sysdevice instead of device names.

Good luck on Saturday,

Jon

it depends

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Turning a standalone node into a two-node VMScluster with MSA1000