Operating System - Linux
1753797 Members
7718 Online
108805 Solutions
New Discussion юеВ

Why we need Quorum Disk Heuristic

 
SOLVED
Go to solution
Md. Minhaz Khan
Super Advisor

Why we need Quorum Disk Heuristic

Dear All,

When i run "system-config-cluster" an option for
"Quorum Disk Heuristic" come. My query is what is "Quorum Disk Heuristic" ?? & why we need this ?? as i defined quorm disk in the "Use quorm disk" Tag??

Can you give me a real world example of Quorum Disk Heuristic ??

Thanks
Minhaz
5 REPLIES 5
Hakki Aydin Ucar
Honored Contributor

Re: Why we need Quorum Disk Heuristic

Hi,

Quorum-disk parameters and heuristics depend on the site environment and special requirements needed. To understand the use of quorum-disk parameters and heuristics, refer to the qdisk(5) man page.

http://hack2defence.blogspot.com/2008/11/cluster-on-centos-or-redhat.html
Matti_Kurkela
Honored Contributor

Re: Why we need Quorum Disk Heuristic

In general, if a network failure splits a cluster in two equal-sized parts, there is a risk of a split-brain situation.

RedHat cluster solves this primarily with fencing. But how does the cluster choose which half is allowed to continue, and which half gets fenced?

Without quorum disk heuristics, the selection algorithm of the RedHat Cluster would be essentially "the fastest half wins". The quorum disk removes the possibility of fencing war: the fenced half cannot reboot, regain quorum and then try to fence the other half. Until the network problem is fixed, the fenced-out half of the cluster will stay out.

Unfortunately, this is not ideal in real life: if one of the halves is idle because no clients are able to reach it, and the other is still serving clients, the idle half is likely fence the active half, because the active half is slowed down by the clients' requests...

This problem can be solved by careful use of quorum disk heuristics: for example, if the clients are in one particular network segment, an useful heuristic would be something that monitors network connectivity to that network segment.

The quorum disk heuristics are run periodically in every node. If the heuristics indicate a failure (by default: if less than 50% of the configured heuristics commands are successful), then the node will leave the cluster voluntarily and won't participate in cluster quorum voting... so the other nodes will fence the failing node.

If the heuristics are chosen wisely and the cluster is split in two halves by a network failure, the part that is isolated from clients will detect the loss of network connectivity and won't even try to get quorum: this allows the other part to win the quorum and keep serving the clients.

The quorum disk heuristics don't have to be network connectivity tests: you can use any command or any script as a heuristic. The only requirements are:
- the heuristics command should not take much time to run, and it should not hang in case of errors
- it should return a exit code of 0 if the test is successful, and any other value if there is a problem.

MK
MK
Md. Minhaz Khan
Super Advisor

Re: Why we need Quorum Disk Heuristic

Dear Matti Kurkela ,

I am sorry that i am not able to understood your posting. My query is

1) As i defined quorm disk (created by mkqdisk)in the cluster configuration then why i need to define Quorm Disk heuristic ??

Is it mandatory for two or three nodes cluster ??

2) I have found one example which is as below:

1. root#mkqdisk -c /dev/sda1 -l qdisk_rac

2. root#chkconfig --level 345 qdiskd on

3. root#service qdiskd start

4. root#system-config-cluster

5. Cluster Name : apache-cluster and selected quorum disk with following options
Interval = 1
TKO = 10
votes =1
Minimum score = 3
Device = /dev/sda1
Label = qdisk_rac

6. Quorum Disk Heuristic
Program = ping -c 2 10.10.10.1
Score =1
Interval = 2


Why in the above example use "ping -c 2 10.10.10.1" What is the purpose of this ping in the cluster ??

Please don't mind. I am new in LINUX & we are going to implement One 2 (two) nodes & One 3(three) nodes cluster. Thats why i am clearing my knowledge


7. Add new node to cluster
Node Name = node1.example.com
Quorum votes = 1
Node Name = node2.example.com
Quorum votes = 1
8. New Fence Device
HP ILO Device
Name = ILOGB89xxxxxx
user = manage
password = manage
Hostname = 10.10.10.100
HP ILO Device
Name = ILOGB88xxxxxx
user = manage
password = manage
Hostname = 10.10.10.101

9. selected Node1 and "Manage fencing for this node"
Add New Fencing level -> Add Fencing to this Level. selected ILOGB89xxxxxx

10. selected Node2 and "Manage fencing for this node"
Add New Fencing level -> Add Fencing to this Level. selected ILOGB88xxxxxx
11. Created failover domains "failover-cluster" and selected
"node1.example.com and node2.example.com" from menu, and selected
"
"Restrict to this Failover Domain"

12. Create Resource
New Resource = Apache Server
Name = Apache HTTP Server service
Server Root = /etc/httpd
Config File = /etc/httpd/conf/httpd.conf
httpd options = /etc/rc.d/init.d/httpd

13. Create a new Resource "File system"
Name = httpd-content
File System type = ext3
Mount point = /var/www/html
device = /dev/sdb1

14. Create a new Resource "IP "
10.10.10.200

15. Create a New Service "Web-Service"
Failover Domain = failover-cluster
And selected "Add shared resource to this service"
A. Apache HTTP Server Service
B. Httpd-Content
C. IP Address (10.10.10.200)

16
#[node1@node1]scp /etc/cluster/cluster.conf node2:/etc/cluster/cluster.conf

17

#md5sum /etc/clsuter/cluster.conf

541b1dc67392b18aad7e1df3612a6afe cluster.conf (both node )

on both node

18
#service cman start
#service rgmanager start

19

#chkconfig cman on
#chkconfig rgmanager on
Matti_Kurkela
Honored Contributor
Solution

Re: Why we need Quorum Disk Heuristic

> 1) As i defined quorm disk (created by mkqdisk)in the cluster configuration then why i need to define Quorm Disk heuristic ??

The quorum disk daemon configuration requires at least one heuristic. You can configure multiple heuristics if you feel it's necessary.

When there are problems in the network, each node must decide alone whether it can continue running the cluster services or not. In this situation, heuristics are used.

Heuristics successful: "I seem to be in good enough condition to do some useful work. I will ask for extra quorum votes from the quorum disk; if I get them, I will use them to win the quorum vote, and then take over the cluster services. (If I don't get the extra votes, there is another node that is in good condition; too bad I cannot communicate with it.)"

Heuristics failed: "Oh no, I seem to have too much problems to do anything useful now. I won't ask for extra quorum votes; that means I will most likely lose the vote. If another node wins the quorum vote, I'll be fenced out of the cluster. If nobody else wins, I'll just wait doing nothing and hope someone fixes the problems."

> Is it mandatory for two or three nodes cluster ??

At least one heuristic is mandatory always when a quorum disk is used. The number of nodes is not the issue.

Of course, if you simply cannot decide a useful heuristic, you might specify "/bin/true" as the heuristic, which means the heuristic will always be successful.

> 2) I have found one example [...]

> Why in the above example use "ping -c 2 10.10.10.1" What is the purpose of this ping in the cluster ??

Possible reasons might be:
- 10.10.10.1 may be an external database server, when your clustered applications need to use the database - if the database cannot be reached from this node, this node is useless because the clustered applications cannot run on this node, and this node should be treated as "failed".

- 10.10.10.1 may be the gateway through which your clients are accessing your cluster: if there is no connectivity between the node and the gateway, your clients cannot reach that node, so this node should be treated as "failed".

Without the heuristics, the split-brain situation would be resolved more or less randomly: the node that can fence the other(s) quickest would win. This is not optimal, because it might lead into stupid situations, as the only node that could actually usefully run the service gets fenced out.

(Sorry about the delay in answering; I needed some time to try and make a clearer explanation. I hope this is better than the previous one!)

MK
MK
Md. Minhaz Khan
Super Advisor

Re: Why we need Quorum Disk Heuristic

Dear MK,

Thanks a lot to give me a clear understanding.
This example was really a good one. thanks again.

Minhaz