Turning a standalone node into a two-node VMScluster with MSA1000

Jeremy Begg · ‎11-23-2009

Hi,

For the past couple of years I have been the system manager for a site running OpenVMS V8.2 on a single AlphaServer DS25. They recently became concerned about the business-critical nature of the OpenVMS application and have decided to implement a two-node VMScluster.

For reasons I won't go into here, they have two identical AlphaServer DS25s. Each machine has a SmartArray 5300A RAID controller and there is a total of 12 physical drives. Currently only one of these machines is in use (the other one has been shut down almost since the day they arrived).

The site has received the necessary hardware to form a two-node VMScluster with shared MSA1000 storage and I will be going there next weekend to set it up.

I'm intending to run the cluster with each DS25 having its own system disk, i.e. booting from its SA5300A controller. The application software and critical shared system files (SYSUAF, NETPROXY, etc) will go onto the MSA1000.

My only problem is that I'm a little unsure on a couple of the cluster configuration details. (I have configured and managed clusters before but always from scratch.)

One thing I'm not sure about is how to set the ALLOCLASS parameter. In the "Guidelines for OpenVMS Cluster Configurations" it says, "A Fibre Channel storage disk device name is formed by the operating system from the constant $1$DGA and a device identifier, nnnn. Note that Fibre Channel disk device names use an allocation class value of 1 whereas Fibre Channel tape device names use a value of 2".

In other words, FC devices ignore the system's ALLOCLASS value?

Currently the running DS25 has an ALLOCLASS of "1". Given that each system will have at least one "local" SCSI disk (the system disk) would I be better off changing the ALLOCLASS to a unique value on each DS25 (e.g. 2 and 3)?

When it comes to preparing the system disk for the second DS25, I see two possibilities (assuming I'm not going install VMS from scratch):

1. Use option 4 in CLUSTER_CONFIG.COM to clone the existing system disk to a scratch disk, then use option 5 to create a new system root (e.g. [SYS1]) on the cloned disk. Once this has been done the cloned disk will be moved to the second DS25.

2. Alternatively, could I just restore an image backup of the original DS25 to the second DS25 and then change the node name (in MODPARAMS.DAT, DECnet, etc)?

Thanks,
Jeremy Begg

Jon Pinkley · ‎11-23-2009

Jeremy,

Yes, FC devices ignore ALLOCLASS.

What is the compelling reason for multiple system disks? If that is what you are familiar with, that may be a sufficient reason. However, unless you cannot schedule downtime for system upgrades, etc., I see very little reason to go to the extra complexity, duplicated upgrades, etc. that come with multiple system disks.

In a two-node cluster, you are going to have to have a quorum disk on the MSA1000, and a place for the shared cluster files. It is much less complex to have a single system disk, a common SYS$COMMON, etc. That's my opinion.

You can still have your page/swap files on local devices (and even system dump files, although it can be nice to have those on a disk that can be seen by the other system, so unless you are really tight on MSA space I would also recommend that the dump files were on the MSA as well.

There are good reasons for multiple system disks, but do consider the reasons before heading down that road, especially if you haven't had experience with multiple system disks already.

Jon

it depends

Jeremy Begg · ‎11-23-2009

Hi Jon,

Thanks for confirming the FC ALLOCLASS issue.

I am familiar with both common-system-disk and multiple-system-disk clusters. I agree a common system disk would be simpler to manage but going with one system disk per node lets me perform rolling O/S updates and allows a node to be booted without the MSA1000 being on-line. (OK those might not be strong reasons, but they work for me for now.)

Thanks,
Jeremy Begg

Martin Vorlaender · ‎11-23-2009

Jeremy,

>>>
In other words, FC devices ignore the system's ALLOCLASS value?
<<<

Yes.

>>>
Currently the running DS25 has an ALLOCLASS of "1". Given that each system will have at least one "local" SCSI disk (the system disk) would I be better off changing the ALLOCLASS to a unique value on each DS25 (e.g. 2 and 3)?
<<<

Why not choose something that won't collide with an FC tape, e.g. 3 and 4?

>>>
When it comes to preparing the system disk for the second DS25, I see two possibilities (assuming I'm not going install VMS from scratch):
<<<

I'd go with option 1. Much cleaner, as the SCSNODE gets used in lots of places (see http://labs.hoffmanlabs.com/node/589 ).

HTH,
Martin

Jeremy Begg · ‎11-23-2009

Hi Martin,

Good point about host ALLOCLASS, I think I'll go with 10 and 20.

Thanks for the suggestion for disk prepartion.

Regards,
Jeremy Begg

Jan van den Ende · ‎11-24-2009

Jeremy,

>>>
They recently became concerned about the business-critical nature
<<<

to me, should imply (at the very least) the deployment of (host based!) Volume Shadowing.

combine with

>>>
lets me perform rolling O/S updates
<<<

... and you have practicly described a high desire FOR a single, common, system disk!
-split off a member, mount privately, change volume label, set VAXCLUSTER=0 & STARTUP_P1 = MIN.
Boot one system from this disk, and upgrade.
Reset VAXCLUSTER & STARTUP_P1 and reboot
(now a mixed-version, dusl-sysdisk cluster)
Shutdown other node, reboot from new disk, and the rolling upgrade is done.
At any point in time, if necessary roll-back is as simple as booting from (a member of) the original sysdisk. Keep this intact till satisfied with the upgrade.

(at current disk prices, I stronly suggest 3-member shadow sets. A.o., this lets your production stay shadowed while upgrading. Way easier in getting single-point-in time backups as well)

Btw, I have always felt comfortible with NOT using SYS0 as a system root in a cluster. It prevents accidental "wrong root" booting, especially by people who are not really familiar with the site, (such as maintenance, but I have also been surprised by "outside" software installers. Don't say it will not happen to your site; upper management tends to make decisions they will not discuss with you in advance. Better safe than sorry!)

hth

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

The Brit · ‎11-24-2009

Jeremy,
I would seriously consider Jon's comments regarding a shared system disk, however it this is not for you, here is a method which is pretty safe, and works for me.

1. Copy the current system disk to a spare internal disk.

2. Remove the disk and install in your second DS25.

3. Disconnect the second DS25 completely from the network and storage.

4. boot 2nd DS25 from root sys0, standalone.

5. make all necessary changes; name, parameters, network, etc.

6. When you are comfortable, reconnect to the network and storage, and run cluster_config on the original node to add the new node to the cluster.

Since you are using separate system disks, you can retain SYS0 as the boot root.

Dave

Hoff · ‎11-24-2009

Consider a common system disk with shadowing. And a two-node cluster requires specific steps for up-time. Details:

http://labs.hoffmanlabs.com/node/349
http://labs.hoffmanlabs.com/node/153
http://labs.hoffmanlabs.com/node/569

The implementation of the node name storage and related handling within OpenVMS and layered products is bad, simply put. Having gone through a node name change recently, it's a toss-up whether a name change or the wholesale reinstallation of OpenVMS is easier. As part of the name change, I ended up reinstalling some layered product components; there was just no way to untangle it. (And if you combine a system disk device name change - which can be the case here - you can end up fixing layered product startup procedures all over the place, too.)

Jan van den Ende · ‎11-24-2009

Re Hoff:

>>>
(And if you combine a system disk device name change - which can be the case here - you can end up fixing layered product startup procedures all over the place, too.)
<<<

AAUUWWW!!!

When/whereever THAT is the case, the responsible System Manager should take a solid course of Logical Name training (and probably repeat that same course one month later!!)

Apart from Oracle V3.x on VMS 3.x (which WAS installed by an SM that NEVER got to understand LNMs, up to the point where ALL user home DIRs were subdirs of SYS$SYSDEVICE:[SYS0] ) I have NEVER EVER in over 25 years
encountered THAT issue!.

fwiw

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Robert Gezelter · ‎11-24-2009

Jeremy,

I would concur with many of the comments about shared system disks.

Some other thoughts on minimizing downtime:

- Consider whether creating a cluster of one is a good first step

- Consider a similar comment with regards to host based volume shadowing to migrate data to the MSA. In essence, switch data on the current disks to access via DS devices, then when the MSA is available, add MSA volumes to the shadow set.

- As a guard against accidents with a shared system disk, it may be sound to create a cloned copy of that system disk (e.g., port and starboard).

I have incorporated the SYS0 exclusion mentioned earlier by Jan. I do not like using the same root number for different nodes. There are more than enough root numbers to not need this, and it prevents accidents.

- Bob Gezelter, http://www.rlgsc.com

Hoff · ‎11-24-2009

Jan, I'll keep your suggestion that the system manager learn about DCL and logical names under advisement.

Your servers likely have a different mix of layered products than the boxes under test here, and with your quarter-century of experience. The layered products are particularly where I am having "fun" here, and not with the site-specific DCL procedures that I've coded up.

I'm testing the system disk and host rename sequences, among other things; a version of the documentation that is listed in the http://labs.hoffmanlabs.com/node/589 article that was mentioned earlier. The sort of tasks that would be required to perform a disaster-restart, for instance, when the server hardware configuration for the backup server does not specifically match the failed server. Or when you're testing a copy of your system disk with an ECO kit, or with a new OpenVMS release.

Here's an example of the "fun" I'm encountering; of a DCL change (;2) I've implemented in one of the layered products (;1) that was installed on the box. I'd have coded this procedure differently, but then the following "minor" update to the original DCL is rather more tolerant of host and device name changes.

************
File SYS$COMMON:[SYS$STARTUP]IDE$STARTUP.COM;2
2 $!
3 $ ddcu = f$trnlnm("SYS$SYSDEVICE")
4 $ define/system/exec/trans=(conc, term) IDE$ROOT 'ddcu'[SYS0.SYSCOMMON.]
5 $ define/system/exec/trans=(conc, term) IDE$JARS_ROOT 'ddcu'[SYS0.SYSCOMMON.IDE$SERVER.JARS.]
6 $ define/system/exec/trans=(conc, term) IDE$ANT_ROOT 'ddcu'[SYS0.SYSCOMMON.IDE$SERVER.ANT.]
7 $ @IDE$ROOT:[IDE$SERVER.COM]IDE$IDESTARTUP.COM
8 $ exit
******
File SYS$COMMON:[SYS$STARTUP]IDE$STARTUP.COM;1
2 $! Version 1.0
3 $! This command procedure was generated during installation
4 $! on 1-JUN-2009 17:08:35.48
5 $ define/system/exec/trans=(conc, term) IDE$ROOT VMS$DKB500:[SYS0.SYSCOMMON.]
6 $ define/system/exec/trans=(conc, term) IDE$JARS_ROOT VMS$DKB500:[SYS0.SYSCOMMON.IDE$SERVER.JARS.]
7 $ define/system/exec/trans=(conc, term) IDE$ANT_ROOT VMS$DKB500:[SYS0.SYSCOMMON.IDE$SERVER.ANT.]
8 $ @VMS$DKB500:[SYS0.SYSCOMMON.][IDE$SERVER.COM]IDE$IDESTARTUP.COM
9 $ exit
************

Jeremy Begg · ‎11-24-2009

Hi all, and thanks for your comments.

I agree a common system disk is simpler to manage but after talking with a friendly local OpenVMS Ambassador I'm very comfortable with my decision not to do it that way.

Hoff's comments about device naming and system startup procedures certainly struck a chord, it's an issue I've faced several times before. In fact I'm rather surprised by Jan's comments; it's my experience that many HP software installs, as well as a few 3rd-party products, are always hard-coding the system disk or installation disk. Why they can't use 'F$TRNLNM("SYS$SYSDEVICE")' or parse out the installation directory from F$GETENV("PROCEEDURE") I'll never know! (OK that's not a panacea, but it works 99% of the time.)

So I think I'm all set with this for now. The "Big Day" is Saturday.

Regards,
Jeremy Begg

Robert Gezelter · ‎11-25-2009

Jeremy,

My wishes for the best of luck that things go smoothly! [The preceding should not be taken to imply that good luck is a substitute for planning, but rather that bad luck can be particularly malign.]

On the system disk question, consider that while the system disks can be separate, that should be a question of load and placement, not a question of contents. By this I mean that the contents of the disks should be identical, down to complete copies of the various SYSn directories, and that SYSn identities should not be reused on different nodes. The omission of SYS0 is also good, in that it prevents a virgin hardware node (or a node whose NVRAM has been somehow wiped) from booting naively into the cluster using an already active root number.

Such a choice makes the choice of shared/non-shared system disk merely a choice of the moment, and where the file is placed. It also means that it is a choice that can easily be re-visited at a later time, without significant repercussions.

- Bob Gezelter, http://www.rlgsc.com

Jan van den Ende · ‎11-25-2009

Re Hoff:

You are (of course) entirely right about most (even HP-supplied) layered product installation procedures (and I tend to think that it certainly NOT got better since Digital times).
However,
both VMSINSTAL and PCSI _DO_ have the option of NOT choosing SYS$SYSDEVICE, but a user-supplied location.
I always make sure to prepare a suitable Concealed Device (and DEFINE that early in the bootstrap sequence) and specify it to VMSINSTAL/PCSI.
And lo & behold: all that ever needs to be changed when changing hardware is the correct definition of the concealed device.
(and to further simplify any future process of changes, on sites I set up, ALL (and only) concealed devices are defined in ONE .COM file; which is @-ed from SYLOGICALS.COM)

-- it is just one more way to exploit the versatility of VMS :-)

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Hoff · ‎11-25-2009

>- it is just one more way to exploit the versatility of VMS

That's one view.

In another view, that approach is how OpenVMS gets a reputation for cryptic and arcane and obscure, and that is also how testing installation environments gets even hairier.

I've written and debugged code that deals with somebody passing in a concealed rooted logical name or a concealed searchlist; that gets ugly. And more than a few folks don't get the logical name correctly defined, which means you're dealing with that path when supporting stuff. And you still end up fixing the device references; PCSI got this stuff basically right, and you still need to re-register volumes when you relocate files. And how many tools implement that option?

I'm increasingly taking the strategy of removing cases of unnecessary versatility from designs.

Jon Pinkley · ‎11-27-2009

Jeremy,

Several questions:

Are the existing internal disks in the DS25 systems compatible with the MSA1000? If so I would strongly consider moving as much of your disk storage as possible to the MSA1000. Shared storage is much better than MSCP served storage. The only disadvantage is that the MSA becomes a single point of failure, but for all practical purposes, it will be a single point of failure anyway.

Do you have shadowing licenses? If not, consider buying them. I haven't looked at the pricing lately, but it used to be much cheaper for an entry-level server like the DS25, and quite a bit cheaper than the cluster license (which you still need). Shadowing has many advantages.

I would try to make your two systems as similar as possible.

Therefore, even if you are going to have multiple system disks, I would encourage you to do as much as possible before making the clone, and then the only thing that would be different on the two system disks (until they diverge) will be the disk label.

Do consider Bob Gezelter's advice about starting with a single node cluster, and moving all of your cluster common files to the MSA, creating a quorum disk on the MSA, and after you think you have that working, give the quorum disk 1 vote, the DS25 1 vote and change expected votes to three. You should still be able to boot with a single node. If you can't, determine why before going forward. After you have a working cluster with expected votes of three, then add then new system root. (Consider using the node's alloclass as the root #, since CI isn't involved, you are not limited to a single hex digit for the root. Using the allocation class for the root makes it easy to keep track, and avoids using the sys0 root.)

I just tried renaming a system root this week on a test system. Changing the root of an existing node is as easy as booting from another root or the distribution disk and renaming the [000000]sys0.dir to something else, like [000000]sys20.dir, and then changing the boot_osflags. This assumes that there aren't any of the poorly coded products that use things like Hoff's netbeans example. I'm with Jan on this one; if things are set up correctly, moving to another physical device should be nearly transparent. I don't have netbeans installed, and it appears its installation procedure was to blame for creating the startup command procedure incorrectly, but in general I have had no problems with moving our system to another and booting with very few changes, primarily a mount_disks.com was what I has to change when we installed at the disaster recovery site on our practice run. That was for the non-system disks. Everything else uses logical names based on the volume labels.

See attachment for more about the netbeans installation, and some additional things that can be done to make your procedures less dependent on disk or root names.

After you have created a second root on the first system disk, then reboot the first machine with the second system root. Get things working so you can boot the system as either node. Then when that works, you will need to think about your other installed software. Most of it was probably installed to sys$common:[sysexe], and that is no longer unique if you have more than a single system disk. If you keep your system disks "identical", then you can just clone the disk when products are upgraded. Otherwise, you will ether need to reinstall or copy stuff and fix the startup procedures. If you have the distributions, removing and reinstalling will probably be the easiest route. This is where a lot of complexity starts when you have multiple system disks.

After you have everything working, then make a clone of the disk, change the label, and take it to the other box. Leave the first node down, and try booting both system roots on the second DS25. When that works, then boot the first node again, and you should have a working cluster.

Then if things don't work, you at least know it must be related to cluster related sysgen settings, or possibly the lack of a working network connection between the nodes.

But if I were you, I would just copy the system disk to the MSA1000, and boot both nodes from it. You will need to do a bit more work (using wwidmgr at a minimum, I have no experience with the MSA1000, so how hard that is to set up to boot from, I don't know.) But if you have both system roots on the system disk that you will clone, what device you boot from shouldn't make a lot of difference (assuming you have used logical names like sys$sysdevice instead of device names.

Good luck on Saturday,

Jon

it depends

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Turning a standalone node into a two-node VMScluster with MSA1000

Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000

Re: Turning a standalone node into a two-node VMScluster with MSA1000