- Integrated Systems
- About Us
- Integrated Systems
- About Us
09-19-2010 05:58 AM
As far i know we need in Redhat Cluster fence device is needed to cut off access to a resource (hard disk, etc.) from a node in your cluster if it loses contact with the rest of the nodes in the cluster.
And quorum disk is needed in the split-brain condition.
My query is:
a) Is the only purpose of fence device to cut-off resource & power-cycle when a node become unhealthy ??
b) Quorum disk is only need to run the cluster operation without any disruption when (say 3 nodes cluster, 2 nodes become fail) majority nodes losses his fitness
Can any one explain me in detail what is quorm disk & why we need it
What is the difference between fence device & quorum disk
In two or three nodes cluster did we need to configure quorum disk ? If yes then why ??
Solved! Go to Solution.
09-19-2010 12:30 PMSolution
The problem is, if the cluster has only 2 nodes and no quorum disk, what would be a split-brain condition turns into a "fencing war" instead.
The quorum disk is an optional extra tool for deciding which nodes are healthy and which are not. It can affect the main cluster daemons' decisions on when to fence and which nodes to fence.
In 2-node clusters the quorum disk eliminates the possibility of fencing wars if the network connections between the nodes are lost. This is usually the primary reason to use a quorum disk in a 2-node cluster.
A 3-node cluster will become a 2-node cluster while any one of the nodes is down for any reason, e.g. on a planned maintenance downtime. To make the cluster safe from fencing wars even while one node is down, it would be a good thing to set up a quorum disk for 3-node clusters too. However, the quorum disk is much more important for 2-node clusters than for 3-node clusters.
An example of a fencing war in a 2-node cluster with no quorum disk would be:
1.) All network connections lost between nodes A and B.
2.) Node A decides B has failed and fences it. (At the same time, node B was trying to fence node A using the same rules, but by random chance, was not quite fast enough.)
3.) Since node B was succesfully fenced, node A now knows B is down for sure. Node A takes over all the cluster services; node B reboots.
4.) Since the cluster is running in the special 2-node mode, there is no quorum check and the node B can restart cluster daemons with no connection to node A. But because the state of node A is unknown to node B, there is a problem... so node B fences node A.
5.) Since node A was succesfully fenced, node B now knows A is down for sure. Node B takes over the cluster services; node A reboots.
6.) Since the cluster is running in the special 2-node mode, there is no quorum check and the node A can restart cluster daemons with no connection to node B. But because the state of node B is unknown to node A, there is a problem... so node A fences node B.
7.) (Go back to step 3.)
This cycle will go on forever until the network connections are restored or the sysadmin stops it manually. Because of the reboot cycle, the nodes can do very little useful work, and the users won't be happy.
The quorum disk can prevent this cycle from happening. It provides (at least) one extra vote to the cluster quorum voting process (which is done by the main RedHat Cluster daemons), and makes the special 2-node mode unnecessary. The quorum voting process will prevent the fenced node from starting the cluster operations until the network connections are fixed... so the step 4) won't happen.
The quorum disk daemon can also be used to set up extra conditions for node fitness. For example, if all your cluster services need a connection to an external database, you can make qdiskd run a script to check if the database is reachable; if it isn't, the node will be considered unhealthy.
- The fence device is an absolute requirement in all production RedHat clusters. Without a fence device, your RedHat Cluster configuration will not be supported by RedHat, will not be protected from split-brain situations, and will not recover automatically from some other types of hardware failures.
- The cluster will use the fence device to cut off nodes whose state is unknown. As a result, the cluster will know that a fenced node is down for sure. This will allow the automatic failover procedures to continue.
- The sysadmin can use the fencing mechanism to manually halt and/or reboot the nodes for any reason, e.g. to remotely shut down a node for hardware maintenance.
- The quorum disk is optional:
* it's a very very good thing to have on a 2-node cluster, to prevent fencing wars
* it's good to have on 3-node clusters too, but slightly less important
* it allows customizable extra health checks: if you need them, you may want to use it on bigger clusters too
* it can be used to allow a single node to keep running the cluster, even if the majority of nodes have failed.
* it has a limit of maximum 16 nodes.
09-20-2010 02:47 AM
Re: Question on Quorum Disk & Fence device in Redhat Cluster
Not to unfamiliar to anyone that has worked with OpenVms cluster!
PS not points for this just a comment.