Operating System - OpenVMS
1839055 Members
4022 Online
110133 Solutions
New Discussion

Re: Replacing a Quorum Disk

 
SOLVED
Go to solution
Jack Trachtman
Super Advisor

Replacing a Quorum Disk

I have a two node cluster with the
quorum disk defined as a virtual disk
on an EVA5000. (VMS V7.3-2)

I want to delete and recreate that disk,
but I'm not clear on the steps. (The HP
docs I've seen so far are not clear.) My
question is similar to "how would I replace
a failed Quorum disk", which has got to
be a somewhat common situation so I'm surprised that I haven't found explicit
docs on this.

Is one approach:
1) Shut down cluster
2) Delete and recreate disk on EVA
with same Unit ID
3) Boot cluster
4) Init disk from VMS side
5) Reboot cluster so cluster file
will get created on new disk

Will the above work?

Is there a simpler way?

TIA
30 REPLIES 30
Garry Fruth
Trusted Contributor

Re: Replacing a Quorum Disk

The above will work. I believe the following will work:
1) Dismount/cluster qdisk; assuming it is not a pageswap/dosd/system disk or has other open files.
2) Delete and recreate on EVA
3) Reinitialize disk
4) Reboot one node
Volker Halle
Honored Contributor

Re: Replacing a Quorum Disk

Jack,

dismounting the quorum disk in a running cluster works (tested on V7.3-1), so you could start with DISM/CLUSTER qdsk. Access to the quorum disk will be temporarily lost but will be re-established immediately.

Assuming that your votes are set up in a way, to allow the cluster to maintain quorum, if QDSKVOTES are not present, you could then delete and re-create the quorum disk on the EVA. This will cause access to the quorum disk to be lost, but the cluster should continue if 2 nodes are up (assuming 2x VOTES=1 and QDSKVOTES=1, i.e. QUORUM = 2).

You may need to do SYSMAN> IO AUTO or IO SCSI_PATH_VERIFY after re-creating the disk unit.

Then INIT and MOUNT/SYSTEM the new quorum disk, this will allow the QUORUM.DAT file to be created by CLUSTER_SERVER and connection to the 'quorum disk' will be re-established.

Volker.
Jan van den Ende
Honored Contributor

Re: Replacing a Quorum Disk

Jack,

just the fact that the QDSK _IS_ a disk with a virtual hardware name makes this much easier than the case with a physical disk.

You just have to make sure of two things:
- you have to re-create the exact same-named unit
- during the period of removal though re-creation of the unit you have no "headroom" in quorunm voters, so you have to make as sure as you can that you do not loose any voters

If you think you need to change the deviceNAME of the quorum disk, yhen a cluster shutdown is the only simple way.
(I think it should be possible to do it in a rolling way as well, but that requires thorough planning, and several reboots and voting manipulations. Not for the faint of heart, nor for the unexperienced. I even doubt whether any such route will be supported)

Just first remove the old unit, and create the a new with the same name will be your best route.

Success.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
comarow
Trusted Contributor

Re: Replacing a Quorum Disk

simply replace the disk.
Conversationally boot with more then enough votes. I will build quorum.dat. Then it will use the qdisk vote.
Wim Van den Wyngaert
Honored Contributor

Re: Replacing a Quorum Disk

It is a pitty that analog to
set cluster/expected_votes=xx
they didn't implement
set cluster/disk_quorum=yyy

Wim
Wim
Eberhard Wacker
Valued Contributor

Re: Replacing a Quorum Disk

Hi Jack,
your approach would be okay but a simplier point 3: boot 1 node minimum or boot the VMS cd for the VMS init.
Any further minimum boot makes no sense because doing this a quorum.dat will not be created.

In case of making the quorum disk unavailable to the quorum disk watcher nodes a cluster state transition will occur.
So far this is no problem but Iâ m really interested if the way â let all nodes running all over the timeâ of Volker and Jan is a totally smooth one related to the activities of the connection manager. Anyone who've made it already in this way?

Che
Volker Halle
Honored Contributor
Solution

Re: Replacing a Quorum Disk

Eberhard,

would you accept a test on a V7.3-1 single cluster node with a local SCSI quorum disk as a proof-of-concept, that you can swap the quorum disk in a running cluster - if you can provide enough votes to keep the cluster running or are willing to use the IPC interrupt (or AMDS) to recalculate quorum ?

The attached file shows a simple test on how this can be done - and it does work !

The different steps are labeled [1] to [8]:

[1] boot a single cluster node with VOTES=1, EXPECTED_VOTES=1, QDSKVOTES=1 and DISK_QUORUM=DKC500 - no quorum file does yet exist.

[2] mount the designated quorum disk (DKC500), this will cause QUORUM.DAT to be created automatically by CLUSTER_SERVER - even if you only mount that disk privately.

[3] dismount the quorum disk.

[4] unplug the quorum disk

[5] As dynamic QUORUM is 2, step [4] will cause quorum to be lost (in this simple config), but it can be easily regained using the IPC>Q interrupt.

[6] plug in the physical quorum disk into DKC400 slot and delete QUORUM.DAT (if I would have had an empty new disk, I could have used it and just do an INIT)

[7] plug the disk back into DKC500 (note: there is NO quorum.dat file anymore on that disk !)

[8] mount the 'new' quorum disk again. CLUSTER_SERVER will create QUORUM.DAT and the quorum disk will become active again.

NO REBOOTS needed at all. And even if your cluster would loose QUORUM, if the quorum disk dies, you could use IPC/DECamds and recover without any reboot.

Volker.
Jack Trachtman
Super Advisor

Re: Replacing a Quorum Disk

Volker,

Thanks for doing that test! (I'm still surprised that this process isn't documented!)

BTW: I use AMDS but am not familiar with IPC. What is it?

Thanks
Jan van den Ende
Honored Contributor

Re: Replacing a Quorum Disk

Jack,

IPC is the Interrupt Control Program.

You enter it at the console (used to be ^P ; nowadays whatever the specific hardware requires)
then

>>> D SIRR C

deposits hex C in the SIRR register, meaning set IPL 12

IPC> Q

at IPC force Quorum recalculation

IPC> C

Continue normal operation.

--- in a cluster, this HAS to be COMPLETED within RECNXINTERVAL

It has always been around, AFAIK, although the Vax syntax was slightly different.

and, this is what AMDS can do for you, and quick, when you ask it to force quorum.


hth,

Proost.

Have on eon me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Volker Halle
Honored Contributor

Re: Replacing a Quorum Disk

Jan,

to exit the IPC> interrupt, you need to enter (not C, which stands for: Cancel mount-verification)

The IPC (short for IPL C interrupt, where 0xC is IPL 12.) is described in the System Managers Manual Volume 1: Essentials

Chapter: Using Interrupt Priority Level C (IPC)

Volker.
Eberhard Wacker
Valued Contributor

Re: Replacing a Quorum Disk

Hi Volker,

I wouldn't had a doubt in such a configuration but many thanx for your poc. The reason I've asked is that I cannot test at the moment the whole scenario with the recreation of a virtual quorum disk. So I can only "believe" that the same will work.

One other point raised looking at your logfile: it can be handled in a smooth way only as long as mvtimeout has not been reached, right ?

Cheers,
Eberhard
Eberhard Wacker
Valued Contributor

Re: Replacing a Quorum Disk

Hi Wim,
what I miss is the possibility to use a quorum file on a HBVShadowed disk :-)
Cheers
Eberhard
Uwe Zessin
Honored Contributor

Re: Replacing a Quorum Disk

| >>> D SIRR C
|
| deposits hex C in the SIRR register, meaning set IPL 12

Hm, SIRR is the Software Interrupt Request Register - my understanding is that it requests an IPL 12 interrupt, but does not set the IPL to 12. Imagine what happens if the processor is currently running at a higher IPL - not a good idea!

Extra points if you know what fork processes are for ;-)

The next step would be:
>>> continue

| IPC> Q
|
| at IPC force Quorum recalculation
|
| IPC> C
|
| Continue normal operation.

Volker has already commented that "C" is used to cancel a mount verification.

| --- in a cluster, this HAS to be COMPLETED within RECNXINTERVAL

Right, and there must not be a bug in the VAX-8600 ;-)

| It has always been around, AFAIK, although
| the Vax syntax was slightly different.

>>> D/I 14 C
.
Jan van den Ende
Honored Contributor

Re: Replacing a Quorum Disk

Re Eberhard:


what I miss is the possibility to use a quorum file on a HBVShadowed disk :-)


Why is that not supported (and never will be !!):

Suppose a cluster with two equal halves (for ease of concept, take a two-site cluster, but the principle is general).

So, each halve has n votes, and there is a quorum vote. Expected_votes 2n + 1, quorum n + 1.

If the halves lose sight of each other, one halve has the qdsk, has quorum, and can continue; the other halve looses quorum.
The cluster integrity is guarded.
Now, suppose the qdsk is shadowed.
Again, the halves loose contact.
That could well mean that the shadow members loose contact.
Now EACH halve sees its member of the qdsk shadow set as THE qdsk.
Both halves maintain quorum, and within a few handfulls of IOs (say, miliseconds?) your data is seriously corrupted..
HPUX has a good descriptive name for that situation:
A "SPLIT BRAIN CLUSTER".
REALLY unwanted.
So: _NO_ shadowed quorum disk. Period.

hth

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.
Eberhard Wacker
Valued Contributor

Re: Replacing a Quorum Disk

Re Jan:

sure, you're absolutely right, it will never be supported but nevertheless I would wish I COULD use a (one site located) shadowset as a quorum disk volume.

Also with the current (and long existing) VMS implementation I can set up the whole cluster in a non supported way so that cluster partitioning can occur (at least at boot time)!
So I still say: I would like to HAVE the possibility to use a shadowset as a quorum volume. If I set up the configuration in a way that can lead to trouble then this is my configuration failure in each case.

Cheers (and a proost)
Eberhard
Eberhard Wacker
Valued Contributor

Re: Replacing a Quorum Disk

Re Jan II:
- sysgen parameter quorum_disk remains as it is referring to a special disk
- this disk is a member of a shadowset
This would mean: no change to the current behaviour (but probably a lot of re-writing VMS code).
Simple(?) question: am I right or totally wrong ?
Proost
Eberhard
Volker Halle
Honored Contributor

Re: Replacing a Quorum Disk

re: last few

Setting SIRR to C requests an IPL 12. interrupt, which then will issue the IPC> prompt (running at IPL 12.)

Using the IPC mechanism in a SMP system, may lead to CPUSPINWAIT or CPUSANITY crashes, so AMDS is definitely the better choice ;-)

If the quorum disk has gone dead and MVTIMEOUT has expired, you can still DISMOUNT/ABORT it, IF no (other) open files are on that disk.

Shadowsets can split as well and if either side of the cluster then continues with it's local member ?! Just use a small quorum node, put it in a safe place and forget about it.

Volker.
comarow
Trusted Contributor

Re: Replacing a Quorum Disk

Since you have availability manager, you do not need the IPC utility. Just reset quorum.


We have white papers on replacing a quroum disk. I'll email it to you.

Ian Miller.
Honored Contributor

Re: Replacing a Quorum Disk

Bob, if there is documentation on this then can it be on a public place (on itrc parhaps) and can you post a reply here with the link so people can find it in future.
Thanks,
Ian.
____________________
Purely Personal Opinion
Wim Van den Wyngaert
Honored Contributor

Re: Replacing a Quorum Disk

Eberhard & co

Shadowing the q disk is not allowed for the case that the 2 disks would start to live separately.

But mirroring is allowed because the controller hides the fact from VMS.

So, shadowing 2 disks behind 1 (dual) controller should also be possible.

FWIW

Wim
Wim
comarow
Trusted Contributor

Re: Replacing a Quorum Disk

They need to send me an email as it's a propriatary document. My email is on my profile. It was an old STARs article.
robert.comarow@hp.com
Ian Miller.
Honored Contributor

Re: Replacing a Quorum Disk

old STARS articles are available e.g.
http://h18000.www1.hp.com/support/asktima/operating_systems/CHAMP_SRC930821000052.html
____________________
Purely Personal Opinion
Eberhard Wacker
Valued Contributor

Re: Replacing a Quorum Disk

Wim, for what's it worth ?!

I simply wish to use a quorum disk as part of a shadowset just to be able to use this shadowset for more than quorum.dat, pagefiles etc. (and in general I prefer a quorum node).

If the quorum disk is defined all over the cluster as 1 dedicated member of a shadowset, then the case can not occur that this important part of a clustered system starts to live separately. I know, this would mean change of basic vms code as I've mentioned before, but this is another story.

And a last remark: sure, we use mirroring for our quorum disks ...

Cheers,
EW
Richard W Hunt
Valued Contributor

Re: Replacing a Quorum Disk

You should not mirror a quorum disk if that is ALL that it is.

BUT if you use it for other purposes, the other purposes will govern whether you mirror it. If it is a SWAP/PAGE disk (ONLY) then you STILL should not mirror it. You are wasting WRITE operations on the mirror.

But if you are using a shared system disk, that disk can be the quorum disk. Ask yourself the purpose of a quorum disk. It is to prevent the system from coming up if you don't have enough to make it work. Well, if you are on a cluster with an even number of physical members AND the system disk is shared, it is the best candidate, bar none. 'cause if you don't have the shared system disk, you are SO hosed...

Now if you have distinct system disks, this isn't true. But if you want to do this "right" then consider some applications disk without which you shouldn't be running your system. Like if you have a separate disk for user home directories, make THAT you quorum disk. QUORUM.DAT doesn't really contain much data anyway.
Sr. Systems Janitor