System Administration
Showing results for 
Search instead for 
Did you mean: 

Question on Quorum Disk & Fence device in Redhat Cluster

Go to solution
Super Advisor

Question on Quorum Disk & Fence device in Redhat Cluster

Dear ALL,

As far i know we need in Redhat Cluster fence device is needed to cut off access to a resource (hard disk, etc.) from a node in your cluster if it loses contact with the rest of the nodes in the cluster.

And quorum disk is needed in the split-brain condition.

My query is:

a) Is the only purpose of fence device to cut-off resource & power-cycle when a node become unhealthy ??

b) Quorum disk is only need to run the cluster operation without any disruption when (say 3 nodes cluster, 2 nodes become fail) majority nodes losses his fitness

Can any one explain me in detail what is quorm disk & why we need it

What is the difference between fence device & quorum disk

In two or three nodes cluster did we need to configure quorum disk ? If yes then why ??

Honored Contributor

Re: Question on Quorum Disk & Fence device in Redhat Cluster

Fencing is RedHat Cluster's primary protection against the split-brain condition.

The problem is, if the cluster has only 2 nodes and no quorum disk, what would be a split-brain condition turns into a "fencing war" instead.

The quorum disk is an optional extra tool for deciding which nodes are healthy and which are not. It can affect the main cluster daemons' decisions on when to fence and which nodes to fence.

In 2-node clusters the quorum disk eliminates the possibility of fencing wars if the network connections between the nodes are lost. This is usually the primary reason to use a quorum disk in a 2-node cluster.

A 3-node cluster will become a 2-node cluster while any one of the nodes is down for any reason, e.g. on a planned maintenance downtime. To make the cluster safe from fencing wars even while one node is down, it would be a good thing to set up a quorum disk for 3-node clusters too. However, the quorum disk is much more important for 2-node clusters than for 3-node clusters.

An example of a fencing war in a 2-node cluster with no quorum disk would be:
1.) All network connections lost between nodes A and B.

2.) Node A decides B has failed and fences it. (At the same time, node B was trying to fence node A using the same rules, but by random chance, was not quite fast enough.)

3.) Since node B was succesfully fenced, node A now knows B is down for sure. Node A takes over all the cluster services; node B reboots.

4.) Since the cluster is running in the special 2-node mode, there is no quorum check and the node B can restart cluster daemons with no connection to node A. But because the state of node A is unknown to node B, there is a problem... so node B fences node A.

5.) Since node A was succesfully fenced, node B now knows A is down for sure. Node B takes over the cluster services; node A reboots.

6.) Since the cluster is running in the special 2-node mode, there is no quorum check and the node A can restart cluster daemons with no connection to node B. But because the state of node B is unknown to node A, there is a problem... so node A fences node B.

7.) (Go back to step 3.)

This cycle will go on forever until the network connections are restored or the sysadmin stops it manually. Because of the reboot cycle, the nodes can do very little useful work, and the users won't be happy.

The quorum disk can prevent this cycle from happening. It provides (at least) one extra vote to the cluster quorum voting process (which is done by the main RedHat Cluster daemons), and makes the special 2-node mode unnecessary. The quorum voting process will prevent the fenced node from starting the cluster operations until the network connections are fixed... so the step 4) won't happen.

The quorum disk daemon can also be used to set up extra conditions for node fitness. For example, if all your cluster services need a connection to an external database, you can make qdiskd run a script to check if the database is reachable; if it isn't, the node will be considered unhealthy.

In short:

- The fence device is an absolute requirement in all production RedHat clusters. Without a fence device, your RedHat Cluster configuration will not be supported by RedHat, will not be protected from split-brain situations, and will not recover automatically from some other types of hardware failures.

- The cluster will use the fence device to cut off nodes whose state is unknown. As a result, the cluster will know that a fenced node is down for sure. This will allow the automatic failover procedures to continue.

- The sysadmin can use the fencing mechanism to manually halt and/or reboot the nodes for any reason, e.g. to remotely shut down a node for hardware maintenance.

- The quorum disk is optional:
* it's a very very good thing to have on a 2-node cluster, to prevent fencing wars
* it's good to have on 3-node clusters too, but slightly less important
* it allows customizable extra health checks: if you need them, you may want to use it on bigger clusters too
* it can be used to allow a single node to keep running the cluster, even if the majority of nodes have failed.
* it has a limit of maximum 16 nodes.

Please read:

Honored Contributor

Re: Question on Quorum Disk & Fence device in Redhat Cluster

WOW, very informative answer MK, Thanks

Not to unfamiliar to anyone that has worked with OpenVms cluster!

PS not points for this just a comment.

Jean-Pierre Huc
Smile I will feel the difference
Super Advisor

Re: Question on Quorum Disk & Fence device in Redhat Cluster

Thanks MK. Your input is so helpful.