1827791 Members
2679 Online
109969 Solutions
New Discussion

building a cluster

 
Con Stelios
Occasional Advisor

building a cluster

Helo,

We look at 4 machines to form cluster just to ease load.

My question: what is involved in combining these 4 servers and form a cluster. Suppose I use HBVS to create integrity between these. Any special problems I should keep a look for? Is there much downtime?

The 4 servers have indivisual system disks, do I remove them or leave them as is. Is that all done by cluster_config,com.?

Help thanks,
Con!
14 REPLIES 14
Karl Rohwedder
Honored Contributor

Re: building a cluster

1st of all: you need the app. licenses :-)

Then I would encourage you to read the guide to cluster configurations as a starter. The possibilities are numerous and depend much on the sort of hardware you have (type of systems, interconnects, storage) and so the question is difficult to answer.

regards Kalle
Robert Gezelter
Honored Contributor

Re: building a cluster

Con,

First, I would strongly recommend a careful reading of "Guidelines for OpenVMS Cluster Configurations" from the OpenVMS documentation set (available online at http://h71000.www7.hp.com/doc/os83_index.html ).

Without understanding the details of your hardware configuration and business requirements, it is not easy to make recommendations. OpenVMS clusters are a very flexible technology, with many choices that depend upon the precise environment.

That said, I prefer to configure these environments around a shared system disk. I have often retained the potential for standalone operation, or cluster operation using separate system disks. The choice depends on the business requirements.

The posting does not include details as to the type of workload. While I am a strong advocate of the use of host based volume shadowing, and have spoken on how it can be employed (see http://www.rlgsc.com/hptechnologyforum/2005/1146.html ), it can also be mis-used in ways that produce poor performance.

Done properly, even the transition to a clustered environment can be accomplished with minimal downtime, but it does require careful planning and execution. There is no substitute for planning and preparation.

CLUSTER_CONFIG.COM assists with the housekeeping of setting up a cluster, but it does not substitute for the thought in planning a cluster.

I hope that the above is helpful. If I have been unclear or can of additional assistance, please let me know.

- Bob Gezelter, http://www.rlgsc.com
Hoff
Honored Contributor

Re: building a cluster

Here are some details around adding or removing nodes in a cluster:

http://64.223.189.234/node/169

This topic covers merging files, and related topics.

You can run with parallel system disks, though you will (still) want to synchronize the identifier and UIC values across all nodes. Details on these identifiers and UICs are in the above article.

HBVS -- mirroring -- seeks to reduce vulnerabilities to disk-level errors and bad blocks, but does not prevent file-level corruptions, errant deletions or crashes. HBVS can also be part of a BACKUP strategy, but HBVS itself does not provide a BACKUP strategy. (If an application tips over and stomps on its database or if a user deletes a key file, HBVS will replicate this to all members of the shadowset.)

HBVS can allow you to improve uptime by having your shadowset scattered across the cluster members, though having shared storage via shared SCSI, FC MSA or FC SAN is usually preferred.

(There are write-ups on RAID options and disk reliability over at the HoffmanLabs site, as well.)

The central savings here are the single authorization database, and the single system disk; the savings are in terms of managing and coordinating the systems.

Clustering can also avoid cases where a key node is down; with distributed environments, any node can field the user and application requests.

Stephen Hoffman
HoffmanLabs LLC
Con Stelios
Occasional Advisor

Re: building a cluster

Thank you Hoff for your reply. May I ask you quetions on your page:

If you are adding a node and the node has its own system disk, you will have to specify the cluster group and cluster password in SYSMAN, or you can copy over the CLUSTER_AUTHORIZE.DAT from the existing cluster nodes. (Make certain the file copy operation requests that the file be placed in SYS$COMMON:[SYSEXE], and not in SYS$SYSTEM:) This SYSMAN configuration step or this file copy replicates the information necessary for the new node to join and communicate with the other nodes within the cluster. This operation applies only if the new node has its own system disk.

Question. I use cluster_config to create the cluster on one system, then follow the above paragraph for all the other systems (3 of them)? Is it that easy?

I need to prevent failure by system disk, so I need more than 1 in the cluster. I have read the files required synchronize & understand, is the only way to have the queues is failover to each node?

If I have one queues manager, and 1 system disk, then no issue, but I need 1 queues manager and many system disks -atleast 2- in the cluster. Is it just using clusterwide searchlist logicals the best solution?

Also I understand HBVS and limitations, but it required by the business. I need an ident, alloclass unique for each machine? This is the confusing element.Manual says set it to 1, others say 2, what is the answer? Is different number for each computer required?

Thanks. Con.
Robert Gezelter
Honored Contributor

Re: building a cluster

Con,

To have clusterwide queues and authentication (which in most cases is strongly recommended) you will need to put the queues and authentication related files on a shared disk (other than one of the system disks).

This stream of postings does not mention how the mass storage is connected to the systems. Could you be more specific.

- Bob Gezelter, http://www.rlgsc.com
Con Stelios
Occasional Advisor

Re: building a cluster

Helo robert

The disk storage is all inside the machines, none is external.

The machines are linked with the gigabit network cards - approved by HP.
Dean McGorrill
Valued Contributor

Re: building a cluster

hi Con,
Yes you can copy the cluster authorize
dat file over to the other nodes, cluster
them and keep separate files. that is
a pain to keep them updated in sync, but theres your redundancy since all your
storage is being mscp served by each system
to the other cluster members. searchlists
I suppose would work. but then again, what
is/are the applications? (common storage
like scsi controllers or even old CI are
nice that way.)
Con Stelios
Occasional Advisor

Re: building a cluster

Helo Dean
Roberts idea of putting sysuaf and qmanager files on shared data storage is a nice solution.

I have no need to keep user information seperate. I will join all users into the one uaf & combine queues to one qmanager.

They users run oracle and custom applications on them.

Still I have questions on alloclass - is it different on each machine or is it just pick a number and use on all? The manual is not helpful, there are differents.
Hoff
Honored Contributor

Re: building a cluster

What sorts of systems are these, and what are the OpenVMS versions?

>>>Question. I use cluster_config to create the cluster on one system, then follow the above paragraph for all the other systems (3 of them)? Is it that easy?<<<

Replicate the CLUSTER_AUTHORIZE file? Sure. You can also have local copies and enter the passwords and group yourself.

>>>I need to prevent failure by system disk, so I need more than 1 in the cluster. I have read the files required synchronize & understand, is the only way to have the queues is failover to each node?<<<

A cluster can have one queue manager and one queue database.

As for your requirement, I'd keep the volatile stuff off the system disk, and -- if you need rapid recovery -- a spare system disk, quite possibly the disk you use for the BACKUP. This can be implemented with a split shadowset, for instance.

>>>If I have one queues manager, and 1 system disk, then no issue, but I need 1 queues manager and many system disks -atleast 2- in the cluster. Is it just using clusterwide searchlist logicals the best solution?<<<

It's what I use. I keep everything volatile off the system disks. Why? Because the system disks are a giant collection of data that does not need to be archived regularly, if the disks are configured to keep the files that get modified off the system disk.

This saves on BACKUP -- you need make a BACKUP /IMAGE of the system disk only after changes, meaning you can skip it in your normal processing.

>>>Also I understand HBVS and limitations, but it required by the business. I need an ident, alloclass unique for each machine? This is the confusing element.Manual says set it to 1, others say 2, what is the answer? Is different number for each computer required?<<<

For local storage requiring an allocation class, pick a unique value from 1 to 255. For best use -- if Fibre Channel is to ever come on-line -- pick allocation class 3 and up for non-FC storage, as FC wants the lowest classes to itself.

The allocation classes are applied to pools of storage that share interconnections, and that have multiple paths; each pool has its unique allocation class assigned. (I should post up a topic on cluster disk allocation classes, as this seems to be a source of confusion here. That's good fodder.)

>>>The disk storage is all inside the machines, none is external.<<<

I will regularly choose to use external racks and shared SCSI configurations. This for ease of configuration, for ease of scaling, and for shared SCSI, and because I can keep the rack powered while powering down a system. Toward the middle- or upper-end, Fibre Channel storage is the typical solution.

Used disks and racks are available for little money; storage capacity on each new generation tends to see its price drop over time, and prices on the used gear also drop...

Stephen Hoffman
HoffmanLabs LLC
Jan van den Ende
Honored Contributor

Re: building a cluster

Con,

One issue I did not note being mentioned:
_BEFORE_ you build your cluster, resolve any potential conflicts on UICs, identifiers, and file ownerships.
Remember, user NAMES and identifier NAMES are just symbolic values for NUMERIC tags. Any two usernames of the same UIC become each other, undistinguishable. Same for identifier values. And the same name for two different values> One of them simply gets dropped, only the last name that gets assigned *by any means) survives.

Sorting out these issues standalone is straightforward, (but potentially involves quite some work); resolving them afterwards is MUCH more complex, and MORE work!

Like Bob already noted above: _DO_ plan ahead!

If you need any help with specific issues: specify the issue and you will get specific answers.

hth

Proost,

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Con Stelios
Occasional Advisor

Re: building a cluster

Helo Jan

Please can you or someone answer these questions? Thank you.

1) Is ALLOCLASS to be different on each machine. Is it better this way? I am unlear with Hoff's explanation.

2) Using HBVS, how do you know which machine creates first the shadow? Do I enter mount commands somewhere in each systartup_vms, and let them fail when they encounter error? This is HBVS of data only, not of system disks I am speaking of.

3) If I put sysuaf qman$* and others on shared disk and those are shadowed, maybe then when do this? sylogicals.com?

Maybe other questions later?

Thank you.
Con Stelios
Occasional Advisor

Re: building a cluster

Helo Hoff,

The systems are lowly ds20s. No laughter of that please. Together they be better in a cluster.

I pick a alloclass for each machine then? Is it easier to do this?

Machine 1: alloclass 3
Machine 2: alloclass 4
Machine 3: alloclass 5
Machine 4: alloclass 6

Best idea? No?

Con Stelios
Occasional Advisor

Re: building a cluster

Helo Jan,
I apologise. I mistook something.
The qman$* queue files etc are to be mounted on set of shadowed disks, no single failure point, so when do they require mounting? In sylogicals.com or before?
Thank you.
Con.
Hoff
Honored Contributor

Re: building a cluster

>>>1) Is ALLOCLASS to be different on each machine. Is it better this way? I am unlear with Hoff's explanation.<<<

In all seriousness, what is it that you do not understand about having a pool of disks?
This from what was posted at http://64.223.189.234/node/349

With some knowledge of what you do not understand here, the wording can potentially be clarified or extended.

Consider the following... If you have the same allocation class (say, 64) for a disk DKA0: on NODEA:: and the same allocation class (64) for a different disk DKA0: on NODEB::, then $64$DKA0: is assumed to the same device on both nodes. In your case (without shared SCSI), it's not. So you want to use different device allocation classes.

Think of the pools and of the storage I/O connections.

If you're using shared SCSI, then you DO have two paths, so you DO want to use the same allocation class, and you DO want DKA0: on the nodes sharing the SCSI bus to be the same device, and with the same allocation class.

In your case (and without multihost SCSI), each node has its own pool of disks.

>>>2) Using HBVS, how do you know which machine creates first the shadow? Do I enter mount commands somewhere in each systartup_vms, and let them fail when they encounter error? This is HBVS of data only, not of system disks I am speaking of.<<<

The first node to the MOUNT accesses the member disks, and starts the formation of the virtual unit. Other nodes detect this, and join the formation.

>>>3) If I put sysuaf qman$* and others on shared disk and those are shadowed, maybe then when do this? sylogicals.com?<<<

SYLOGICALS.COM is the absolute key to a multiple-system-disk cluster.


>>>The systems are lowly ds20s. No laughter of that please. Together they be better in a cluster.<<<

That box is decent. I picked up an AlphaServer DS20E for small dollars a while back, and more powerful Alpha gear is now cheaper. A cluster isn't necessarily faster than a faster node.

>>>I pick a alloclass for each machine then? Is it easier to do this?

Machine 1: alloclass 3
Machine 2: alloclass 4
Machine 3: alloclass 5
Machine 4: alloclass 6<<<

Sure.

If you have SCSI controllers that can support it, I'd probably string all the disks together. This to avoid tossing disk I/O over a network, when you're running shadowing over a network.