Operating System - OpenVMS
1823967 Members
4342 Online
109667 Solutions
New Discussion юеВ

Re: Why mount the quorum disk ?

 
Wim Van den Wyngaert
Honored Contributor

Why mount the quorum disk ?

I have a 2 node cluster with quorum disk.

If I boot the first node, it starts the cluster as if the quorum disk is a voting member. However, I didn't mount the quorum disk. And when the cluster is up and running, the current votes is on 2 instead of 3 (ana/sys show cluster, the votes field).

It's logical that the cluster was formed when the quorum disk was seen by the first node because the mount of the quorum disk is done a lot later. But why isn't the vote counted in the voting schema ?

Wim
Wim
19 REPLIES 19
faris_3
Valued Contributor

Re: Why mount the quorum disk ?

Hi,

Even not mounted, the quorum disk should give its vote : see
http://h71000.www7.hp.com/wizard/wiz_8592.html

So there must another prolem. Any Opcom messages ? What is the value of QF_VOTE in sh cluster/cont ?


Wim Van den Wyngaert
Honored Contributor

Re: Why mount the quorum disk ?

When I mounted the disk, the votes increased. So that link is not correct (for nodes 7.2).

In operator.log I find "Please mount the quorum disk", thats all.

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Why mount the quorum disk ?

QF_VOTE is now YES. Before the change ???

In the status I saw qf_active but not qf_watcher.

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Why mount the quorum disk ?

QF_VOTE was however missing in the status summary in sda. So it was NO.

Wim
Wim
faris_3
Valued Contributor

Re: Why mount the quorum disk ?

I think the access to the quorum file has always used physical i/o so the device need not be initially mounted to give its vote.

But in older versions of VMS, dismounting the quorum disk clears the valid bit for the unit, which makes a readpblk fail. (VOLINV).

Did you dismount the quorum disk ?
Or some event/error may have cleared the valid bit of the device.

( an SDA show device before the disk was mounted could give a hint)
Wim Van den Wyngaert
Honored Contributor

Re: Why mount the quorum disk ?

The disk was not mounted since long. So, no the disk was not mounted in the life of the cluster. But I can't replay the scenario because it's now running production.

Small note : it's a cluster with 2 AS1000 with votes and access to disks running 7.2. In the same cluster we have 2 AS4100 running 6.21h3, these have no votes and no qdisk setup.

Wim
Wim
John Abbott_2
Esteemed Contributor

Re: Why mount the quorum disk ?

A system only becomes a qf_watcher when the disk_quorum device is mounted on that system. See http://h71000.www7.hp.com/doc/731FINAL/4477/4477pro_002.html#integ_avail 2.3.9 points.
Don't do what Donny Dont does
Wim Van den Wyngaert
Honored Contributor

Re: Why mount the quorum disk ?

I have a simular cluster for testing.

I removed the mount of the quorum disk.
The cluster started without any problem and qf_votes is YES and votes is 3, so correct.

So, the mount is not needed to boot or to have the disk votes in the schema.

I rebooted the 4 nodes at once in my original case. I guess something went wrong.
But it IS solved by doing the mount (can prove it with console output).

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Why mount the quorum disk ?

On the same cluster, I did a shutdown of 1 of the 2 voting systems. With option REMOVE_NODE.

MVTIMEOUT 3600
SHADOW_MBR_TMO 120

The disks are connected to a dual controller that is in turn connected to the 2 voting systems. For the non-voting members, the disks are MSCP served. None of the systems is fully patched.

At 19:12:45 the cluster transition started
At 19:12:45 cluster transition completed
At 19:12:48 the first mount verification message was displayed
At 19:12:49 the message "xxx has been removed from VMScluster" was displayed
At 19:12:50 the first mount verification complete message was displayed
At 19:13:04 the cpu of xxx halted
At 19:15:35 a disk was thrown out of the shadow set
At 19:16:29 the just removed disk is mentioned to be offline and 1 second later it is AGAIN removed from the shadow set

My questions.
1) how long can a mount verification take ? And in this case, the task was dual : the remaining voting member must take control of all disks and it must serve them to the non-voting members. 120 seconds seems a long time for me.

2) how is it possible that the disk is removed 2 times ?

3) the 2nd disk removal is almost 220 seconds after the mount verification was started. Why is it permitted to do overtime (SHA...TMO is 120) ?

Wim
Wim
Robert_Boyd
Respected Contributor

Re: Why mount the quorum disk ?

What was your value for RECNXINTERVAL set to on each of the nodes in the cluster at the beginning of this event?

Robert
Master you were right about 1 thing -- the negotiations were SHORT!
Keith Parris
Trusted Contributor

Re: Why mount the quorum disk ?

Mounting the quorum disk gives you the advantage of mount verification, and better path failover processing.

It is also necessary for the quorum disk to be mounted in order for the Cluster_Server process to be able to initially create the QUORUM.DAT file when the quorum disk is first used. But it sounds like you're long past that point in time.
Robert Brooks_1
Honored Contributor

Re: Why mount the quorum disk ?

Keith wrote . . .

Mounting the quorum disk gives you the advantage of mount verification, and better path failover processing.

---

As the multipath subsystem REQUIRES mount verification in order to work at all, a device
mounted /NOMOUNT_VERIFICATION (or a quorum disk
not mounted) will not fail over at all, so it's not really a case of "better path failover", it's a case of whether or not there is any path failover at all.


-- Rob
Cass Witkowski
Trusted Contributor

Re: Why mount the quorum disk ?

I beleive that the Quorum disk will only be asked to cast a vote if it is needed. So when you boot up one node the quorum disk is needed but when the second node joins the cluster the quorum disk is no longer needed and so it no longer votes.

Cass
Wim Van den Wyngaert
Honored Contributor

Re: Why mount the quorum disk ?

RECNX is on 100 seconds.

So, there is no failover (also MSCP ???) when the quorum disk is not mounted. I consider this as a shortcoming of the system (not to say a bug).
Instead of implementing it properly they display a console message.

So, doc should say :
Mounting is not required unless you need failover.

Now only the 3 questions remain and the question why votes were not given to the quorum disk.

Wim

Wim
Wim Van den Wyngaert
Honored Contributor

Re: Why mount the quorum disk ?

Did some more testing. The cluster must have initially been booted due to 2 voting members, not the quorum disk. Sorry for that misinformation.

My further findings.

1. The file quorum.dat must exist for the disk to be counted in the voting schema (that's why it didn't count in my original case : the disk was never mounted).
2. The file quorum.dat will be created as soon as the disk is mounted. So, it's not created during the boot of the first cluster member to give the quorum disk the vote (disk is not yet mounted and it seems that it can not create the file without the disk being mounted).
3. Disk must be mounted to avoid console messages and to ensure failover (no prove because no test environment for it).

My 3 questions remain.

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Why mount the quorum disk ?

Correction on 3.

The message "please mount quorum disk" is given ONLY when quorum.dat doesn't exist, so when it's trying to create the file.

Wim
Wim
Keith Parris
Trusted Contributor

Re: Why mount the quorum disk ?

Cass wrote:
> I beleive that the Quorum disk will only be asked to cast a vote if it is needed. So when you boot up one node the quorum disk is needed but when the second node joins the cluster the quorum disk is no longer needed and so it no longer votes. <

A quorum disk will contribute votes whenever it can be validated, whether the votes are needed or not. Wim explained that the reason he didn't see the quorum disk votes initially was that prior to his mounting the quorum disk, a QUORUM.DAT file did not yet exist on the disk, so it couldn't vote.

Wim wrote:
> 1) how long can a mount verification take ? And in this case, the task was dual : the remaining voting member must take control of all disks and it must serve them to the non-voting members. 120 seconds seems a long time for me. <

A mount verification can take up to MVTIMEOUT seconds. :-)

After the node goes away, with RECNXINTERVAL set to 100, you're going to wait 100 seconds before you give up and have a state transition to throw the node out of the cluster.

That plus the 120 seconds for SHADOW_MBR_TMO might explain the 220 seconds you saw.

> 2) how is it possible that the disk is removed 2 times ? <

Could that be an OPCOM message from another node?

> 3) the 2nd disk removal is almost 220 seconds after the mount verification was started. Why is it permitted to do overtime (SHA...TMO is 120)? <

SHADOW_MBR_TMO specifies the minimum acceptable amount of time we wait before a member is removed; it is allowed to take longer, but the disk won't be thrown out any sooner than this.

John wrote:
> A system only becomes a qf_watcher when the disk_quorum device is mounted on that system. See http://h71000.www7.hp.com/doc/731FINAL/4477/4477pro_002.html#integ_avail 2.3.9 points. <

A system can become a quorum disk watcher without the disk being mounted. The document you cite correctly advises "To permit recovery from failure conditions, the quorum disk must be mounted by all disk watchers" but technically, to become a quorum disk watcher only requires a proper setting for the SYSGEN parameter DISK_QUORUM and successful direct (not MSCP-served) access to the disk (see Roy Davis' VAXcluster Principles, p. 7-15).
Cass Witkowski
Trusted Contributor

Re: Why mount the quorum disk ?

Thanks for the clarification Keith.
Jan van den Ende
Honored Contributor

Re: Why mount the quorum disk ?

Thanks, Keith.

To me, this is just all the more arguments to try to avoid Quorum Disks whenever a valid quorum scheme can be constructed without QD (i.e., ANY config with > 2 nodes is better off without)

Just my EUR 0.02

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.