1752710 Members
6029 Online
108789 Solutions
New Discussion юеВ

Re: Quorum Disk Failure

 
SOLVED
Go to solution
Jack Trachtman
Super Advisor

Quorum Disk Failure

We have a two node cluster with a quorum disk and are familiar with what happens when a node is shut down or fails - the other node goes into a cluster reconfigure along with issuing Operator msgs.

What happens if the Quorum disk fails?

Is the scenario the same on the hosts - cluster reconfigure with msgs referencing the loss of the quorum disk or does something different happen?

What happens when the failed Quorum disk is made available again?

Do the hosts automatically recongize the appearance of the quorum disk and use it (with appropriate msgs)?

In other words, how does the loss/restore of a quorum disk compare to loss/restore of a host node?

One more question (for extra points!). What happens when the quorum disk "temporarily" disappears? We have our quorum disk on an HP SW SAN. If we make a Zoning or Presentation change that affects the quorum disk, besides VMS going through a Mount Verify, what does the host recovery look like?

Thanks much
15 REPLIES 15
Ian Miller.
Honored Contributor

Re: Quorum Disk Failure

the quorum disk watcher sees all! :-)
(in QDISKINTERVAL seconds)
You will see messages about the quorum disk being unavailable and as long as there are enough votes then the cluster will continue. When the qdsk returns it will be recognised as the quorum disk by the presence of the QUORUM.DAT and the votes (QDSKVOTES) counted.

I assume both systems are directly connected to the quorum disk.
____________________
Purely Personal Opinion
Jan van den Ende
Honored Contributor
Solution

Re: Quorum Disk Failure

Well, except for its obvious passiveness, the quorumdisk should be considered as 'just' another node, although, due to the mentioned passivenees, much slower to respond.
It is indirect. The first node to reconnect checks the quorumdisk for recent 'stamps' by the other node(s). They are not there, so it simply leaves its own imprint. The second one finds teh stamp (and it better be of a known node). It also leaves its trace, and the next time #1 comes along it can conclude that the quorum disk is a valid member again.

Even IF (unwanted situation) the departure of the quorumdisk leads to a loss of quorum (the cluster 'hangs') this mechanism is above the hang, and if the return of Qdsk suffices to regain quorum, that WILL be recognised, and the hang will be over.

jan
Don't rust yours pelled jacker to fine doll missed aches.
John Eerenberg
Valued Contributor

Re: Quorum Disk Failure

In a 2 node cluster, one can avoid a hang if the quorum disk fails by setting Votes=2 on each node, and qdskvotes=1 so expected_votes=5. This way, if either node goes down, the requisite minimum of 3 votes is held and if the quorum disk goes bye-bye, 4 votes are held.
Only in the event of a double failure does the surviving node hang.
Maybe that will help someone.
john
It is better to STQ then LDQ
Jan van den Ende
Honored Contributor

Re: Quorum Disk Failure

John,

works fine, but not even needed.
@ nodes each 1 vote + qdsk 1 vote = 3 votes expected. Any single voter gone leaves 2 vote +> quorum maintained.

The abovementioned temporary hang (and resume) would occur at one node out (eg, maintenace) and THEN having your SAN disconnecting and reconnecting the qdsk.

hth

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Jan van den Ende
Honored Contributor

Re: Quorum Disk Failure

Sorry, two irritating typos:

"@ nodes" should read "2 nodes"
" +> " " " " => "

... I sometimes (have to) work on systems with different keyboard layouts. It should have been forbidden, but then, who should be allowed to declare "THE" correct layout?
Don't rust yours pelled jacker to fine doll missed aches.
Lokesh_2
Esteemed Contributor

Re: Quorum Disk Failure

Hi ,

I agree with Jan. Why setting votes=2 for cluster members ? It works fine for me with votes=1 .

Best regards,
Lokesh
What would you do with your life if you knew you could not fail?
Wim Van den Wyngaert
Honored Contributor

Re: Quorum Disk Failure

If you give 2 votes to each cluster member and 1 to the quorum disk, you have the advantage that
IF
1) node 1 is stopped with REMOVE_NODE
2) the quorum disk gets lost after 1) has completed
THEN
your cluster is still alive (because a minority of 1 vote left the cluster with total votes equal to 3, so 2 votes left).

If you have 1-1-1, the cluster would hang until the disk is replaced or the second node is rebooted.
Wim
Jan van den Ende
Honored Contributor

Re: Quorum Disk Failure

Touchee Wim!

Indeed, that IS the reason to use 2-2-1.

Then the issue of SAN disconnecting/reconnecting disappears nearly completely: Only if one node left WITHOUT adjustment (eigther by crash or by operator forgetting "remove_node"), THEN if the SAN connection disappears BEFORE a SET CLUSTER/EXPECTED,only THEN will the hang still occur. Should be very rare.
Jan
Don't rust yours pelled jacker to fine doll missed aches.
Lokesh_2
Esteemed Contributor

Re: Quorum Disk Failure

Hi,

Thanks for explaning the advantage of 2-2-1 . I will note it down.

Best regards,
Lokesh
What would you do with your life if you knew you could not fail?