Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

OpenVMS Gigabit cluster, multisite - switch config question(s)

 
SOLVED
Go to solution
Hejib
Frequent Advisor

OpenVMS Gigabit cluster, multisite - switch config question(s)

Hi,
We've a 4 node multisite OpenVMS cluster(V7.3-2), with cisco switches joining the two halves. We're having trouble booting our quorum node - it downloads, joins the cluster, but bugchecks when trying to mount the system disk shadowset. (I've a call open with HP - following the official route).

My question is 'what questions do I need to ask our networks people' - to ensure connectivity is o.k. i.e. VLANs and stuff like that? Lower level switch type questions?
(I'm finding a lot of 'I can ping it, it's o.k.' responses.)

Your assistance is appreciated...
9 REPLIES 9
Karl Rohwedder
Honored Contributor

Re: OpenVMS Gigabit cluster, multisite - switch config question(s)

What is bugcheck code from your quorum node?

regards Kalle
Hejib
Frequent Advisor

Re: OpenVMS Gigabit cluster, multisite - switch config question(s)

%SYSINIT-I- waiting to form or join an OpenVMS Cluster
%VMScluster-I-LOADSECDB, loading the cluster security database
%EWA0, Fast mode set by console
%CNXMAN, Sending VMScluster membership request to system MPDT04
%CNXMAN, Now a VMScluster member -- system MPDT05
%EWA0, Link state: UP
%SHADOW-F-NOACCMBREX, unable to access all mbrs of existing shadowset
**** OpenVMS (TM) Alpha Operating System V7.3-1 - BUGCHECK ****
%VMScluster-I-SYSDISK, Satellite system disk is _$1$DGA11011:

(n.b. OpenVMS 7.3-2 gave us MSCP's disks greater than 9999.)
Peter Zeiszler
Trusted Contributor

Re: OpenVMS Gigabit cluster, multisite - switch config question(s)

Does the quorum disk have direct access to all of the Shadow set members?

You could try mscp sharing the disks too. We enabled ours so that way if we accidently lost direct connections the machines would stay up --- and yes - I did get that tested when someone accidently disconnected BOTH fiber lines because they counted the ports wrong on the switch.
Volker Halle
Honored Contributor

Re: OpenVMS Gigabit cluster, multisite - switch config question(s)

Graham,

if the quorum node can join the cluster, it has direct network connectivity to ALL the other members of the cluster, so there is no need to talk to the network people right now...

Here are a couple of questions to answer:

- had this ever worked before ?
- what changed ?
- what is the config of the system disk of the quorum node ?

- MSCP-serving disk device units > 9999 has been added in VMS83A_DRIVER-V0100 and VMS82A_DRIVER-V0200 around JUN-2007, but I strongly doubt, that this feature has been back-ported to V7.3-2 - and your quroum node seems to even be running V7.3-1 !

- having a quorum node boot as a satellite is NOT a suggested configuration. You won't be able to boot it, if the other cluster nodes are down waiting for the votes of the quorum node. Consider to use a system with a local system disk.

Volker.
Jan van den Ende
Honored Contributor

Re: OpenVMS Gigabit cluster, multisite - switch config question(s)

Graham,


Am I correct in guessing that $1$DGA11011: is one member of your system disk shadow set?

In that case, try modifying your satellite load parameter to NOT use a shadow set member, but the shadow set itself. ( DSA...)
Our 4-node, 2-site cluster if fact consists of _ONLY_ satellites booted in this way!
Yes, that is correct, EVERY node is load host to all other nodes (the current 4, + a former one not yet totally "forgotten", + our test node which can (and normally does) boot from its own local system disk, but when booted from the network it is a fully configured cluster node (and it has functioned as such on various occasions.

We had some quite interesting discussions on this setup, mainly with Cristian Moser (Mr CMOS), and later with Mr Richard Bishop. Both agreed that it was a totally valid config, with definite advantages. The drawback is that _IF_ you get serious issues in connectivity or multiple simultanious node fails, you better have some people around that KNOW this! And realise, that if you ever have a CLUSTER down, then you have a rather complex cluster boot sequence, involving a rolling reboot of the first members to boot "non-standard"

But the bootom line of this whole story: Modify your LOAD PARAMS to boot from the shadow set, and NOT from one member.
You WILL need to have MSCP serving enabled for this to work.

Success!

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Colin Butcher
Esteemed Contributor

Re: OpenVMS Gigabit cluster, multisite - switch config question(s)

From a network perspective - what's changed (if anything)? What kind of control do you have over the network interconnections and switch configurations? Are you in the "interesting" situation where someone else can change network things without you knowing?

Has someone changed your WAN links to be IP routed? You need your WAN links for the cluster to be supporting SCS (the cluster protocol), MOP (for booting) and probably AMDS as well. You will also presumably need IP and DECnet too.

All those different protocols (especially SCS) require a low latency layer 2 network, preferably with physically separate multiple paths. Those might be physically separate networks, or they could be VLANs with apprpriate QoS configured in the switches.

"Ping" is a layer 3 TCP/IP utility and will tell you if things are reachable by TCP/IP - but may tell you absolutely nothing about the layer 2 connectivity unless you know how your TCP/IP subnets and routing are configured to fit over your inter-switch links and layer 2 paths (either real LANs or VLANs).

If you're having real trouble then it may be worth getting someone in to help you map out the network and see what's actually interconnected. However, your problem may just be the way you have things configured at VMS level and a change elsewhere (presumably at network level) has exposed a flaw in the cluster design and device naming. It's difficult to tell without spending time looking at it.

Cheers, Colin (http://www.xdelta.co.uk).
Entia non sunt multiplicanda praeter necessitatem (Occam's razor).
Art Wiens
Respected Contributor

Re: OpenVMS Gigabit cluster, multisite - switch config question(s)

Some details of your cluster config might help provide some answers.

"We've a 4 node multisite ... two halves"

We can safely guess two nodes at each site. What are the LAN interconnects? What are the disk interconnects? SAN fabric spanning both sites?

"trouble booting our quorum node"

This is a fifth node I assume? Is it in one of the two sites or in a third? What are the LAN and SAN interconnects with this system?

Cheers,
Art
Kirsten Knüttel
Frequent Advisor

Re: OpenVMS Gigabit cluster, multisite - switch config question(s)

Hi Graham,

do you still have this problem or did you find a solution for it? We have the same problem with a system shadow set with multipath devices.

regards,

Kirsten
Volker Halle
Honored Contributor
Solution

Re: OpenVMS Gigabit cluster, multisite - switch config question(s)