Operating System - Linux
1752600 Members
4631 Online
108788 Solutions
New Discussion юеВ

Re: which one is best Quorum disk or Fencing in Redhat Linux

 
senthil_kumar_1
Super Advisor

which one is best Quorum disk or Fencing in Redhat Linux

Hi All,

 

I have some questions about using Quorum disk and Fencing with redhat clustering.

 

1)Are both methods used to find cluster quorum?

 

2)Is fencing a new method available from RHEL4 and after that?

 

3)Is Quorum disk a old method used on RHEL 3 and before that?

 

4)Is windows OS clustering having a concept like fencing? if not, which method windows clustering is used to find cluster quorum?

 

5)what are the other OS based clustering (suse linux, solaris, hpux and IBM AIX) using fencing other than RHEL? If those are not using fencing, what mechanism they are using to find cluster quorum?

 

6)In Rhel 5, Is it enough if we are configuing fencing only but not quorum disk?

 

7)in Rhel 5, Can we configure cluster with out fencing but with quorum disk?

 

8)But if we did not configure fencing in Rhel 5, we can not have support from RHN?

 

7)so we don't need to configure quorum disk on or after RHEL 4 as we have fencing to find the cluster quorum?

 

8)But it will be good if we configure quorum disk alos even if we are having configured fencing in two node clsuter, it will be use full to avoid "split brain" state of the cluster?

 

9)When fencing is happening, will the fencing node be shutdowned or rebooted? where to configure this?

 

10)could any one please explain split brian state clearly?

9 REPLIES 9
Matti_Kurkela
Honored Contributor

Re: which one is best Quorum disk or Fencing in Redhat Linux

1.) No. Quorum disk is an optional, additional tool for cluster quorum determination; Fencing is a method to make sure the hosts that are dropped outside the cluster definitely won't interfere with the cluster. Fencing is not optional in RedHat Cluster. RedHat won't support clusters with no fencing configured.

 

In RedHat clusters, the current quorum determination method is the cluster-wide multicast communication and a voting algorithm. Each cluster node announces it wants to receive multicast traffic for the cluster's multicast address, and each node will send messages to that address.

 

When there is a failure, some nodes will not reach the others. The nodes that can hear each other will form a group: if that group has more than 50% of the expected cluster votes,  that group wins: it will continue cluster activities, and this group will attempt to fence the missing nodes to make certain they're really dead. Only after the fencing is confirmed successful, the cluster will determine which nodes will take over the services that were running on the fenced nodes, as necessary.

 

In two-node clusters, if both nodes have an equal number of votes (as is the default), neither node can get more than 50% of the cluster votes alone. In this case, there are two possible solutions:

  • two-node mode: the voting algorithm is ignored, and whenever communication between nodes is lost, each node will just try to fence the other one. The fastest node wins. This is not ideal, because the fenced node may reboot, come back up, then start up a cluster on its own and fence the other node. This may result in a "reboot-and-fence loop".
  • quorum disk: this adds an independent communication channel (the quorum disk) and extra vote(s), so even a two-node cluster can use the voting algorithm. 2 nodes + quorum disk = 3 votes, so when communication is lost, one node gets 1 vote from himself and 1 from the quorum disk. That's more than 50% of 3 votes, so the node is allowed to continue. The other node gets 1 vote from himself, but the qdiskd on that node sees the quorum disk vote has already been granted and won't grant the extra vote again; as a result, that node has only 1 vote out of 3 and knows that it must stop cluster activity it can re-establish contact with the other node.

 

2.) No; see above.

 

3.) No, quorum disk is available in RHEL 4 and later too.

 

4+5.) I'm not familiar with Windows OS clustering, but on HP Serviceguard clustering, the equivalent feature is the cluster safety timer that forces the node to TOC (=crash) if the node is left outside the cluster quorum because of a communication fault. In HP-UX Serviceguard, this is a feature built-in to HP-UX kernel; in Serviceguard for Linux, this is implemented with the deadman kernel module.

 

6.) Yes. If you have 3 or more nodes (and especially if you have an odd number of nodes), you won't need a quorum disk. If you have only 2 nodes, quorum disk is very helpful; if you have an even number of nodes (4, 6, 8...) the quorum disk will still be helpful if you have a fault that causes your cluster to split exactly down the middle.

 

7.) No, fencing is not optional. Failover operations won't start until the cluster receives confirmation that fencing is successfully completed.

 

8.) You can get patches from RHN; but if you have an issue with your cluster and attempt to get help from RedHat support, you'll only get "We already told you it won't work without fencing. Read the documentation."

 

7 for the second time:) No, you don't have to configure quorum disk at all if you don't want to. But if you have a small even number of nodes, there will be a risk of a reboot-and-fence loop. This risk is the greatest in 2-node clusters; it is possible but usually less likely in larger clusters.

 

8 for the second time :) If you don't have fencing configured, your cluster will not failover anything automatically. It will not provide improved HA over manual failover.

 

9.) By default, the node will be rebooted, but you can configure this through the configuration parameters of the fencing agent.

 

10.) Split-brain: a network fault or a similar communication failure causes the cluster to split into two identical halves, A and B. The halves aren't damaged, they just cannot communicate with each other.

  • A "thinks": No response from B. I'm OK, B has failed. I must take over B's services in addition to my own.
  • B "thinks": No response from A. I'm OK, A has failed. I must take over A's services in addition to my own.

If there was no fencing, this would cause bad things, as both halves will try to use the same shared disks, service IP addresses etc.

 

With fencing, this becomes:

  • A "thinks": No response from B. I'm OK, so B may have failed. Let's fence it to make sure.
  • B "thinks": No response from A. I'm OK, so A may...<power off>
  • A "thinks": B is now definitely dead. Now I can safely take over B's services and add them to my own.

 

You may already know that having two or more hosts with the same IP address will cause a conflict in the network, which causes unpredictable connection losses: perhaps a client's TCP connection is originally established with A. In mid-connection, a router will see B is claiming the same IP address, and will update its ARP table and start sending the packets to B. B suddenly sees incoming packets that are clearly part of an already-established session, but B has no knowledge about such a session at all. It sends a TCP reset packet to abort the strange connection. From the client's viewpoint, the server initialized a session, and then suddenly broke it for no reason: since both A and B have the same IP address, the client has no way of knowing the connection was switched from A to B in mid-connection.

 

A filesystem access in a split-brain situation is even worse: as soon as A modifies some filesystem structure that B has in its cache (or vice versa), there will be a high risk of filesystem corruption, because B will not know its cached data is no longer up-to-date. This will lead to filesystem corruption and data loss. Even special cluster filesystems like GFS or OCFS cannot completely avoid this: the hosts using the cluster filesystem must communicate with each other to coordinate their actions: "Hey everyone, I'm reading this block, don't write on it until I'm done". "OK, I'm writing to this other block, don't try reading it until I say I'm done or you know I've died". And so on.

MK
senthil_kumar_1
Super Advisor

Re: which one is best Quorum disk or Fencing in Redhat Linux

Dear Matti,

I have two nodes and each node is having one NIC with configured following IP address. And NIC of both the servers is having connected with Network switch as other normal servers.

node1-10.0.0.1
node2-10.0.0.1


Now I am configuring one failover domain called "httpd", those two nodes are part of that failover domain, I am configuring
the IP address "10.0.0.10" under resource manager of this failover domain "httpd". And I configured script for httpd service under
resource manager and I configured service...Every thing is fine...

Now I have some questions based on above setup:

1)The IP "10.0.0.10" used by clients to access the service httpd, Am I correct?


2)Now When we start the cluster, In Which node the cluster activity will be started by default (which node will host the service hpptd), Where we have to configure this, Do we need to configure this explicitly in failover domain configuration properties?


3)how the IP address "10.0.0.10" will be assigned to the selected node, for say, will one alias interface be created for the IP address "10.0.0.10" on the selected node automatically by cluster services?


4)For say, If all the cluster activity is happening from node1 and httpd service is running on node1, If suddenly the network connection of nodel1 lost, then node2 will fence (reboot / shutdown) the node1 and will start cluster activity on node2, Am I correct?

5)Which one is best whether reboot / shutdown of the node when fencing is happening?

6)While fencing, Will it reboot / shutdown the server suddenly / properly? is there any possiblities for OS crash while fencing? If yes, how to avoid that?


7)Once network connection to node1 restored, will the cluster activity resume back to nodel automatically / will it stay on node2 only?
Matti_Kurkela
Honored Contributor

Re: which one is best Quorum disk or Fencing in Redhat Linux

> node1-10.0.0.1
> node2-10.0.0.1

 

Each node should have an unique IP address. (Typing mistake?)

 

1.) Yes.

 

2.) If you configure the "httpd" cluster service to autostart and define it a failover domain which prefers one server over another, it will automatically start on the most-preferred server.

 

3.) Yes, it will be an IP alias. However, on RHEL 5, it is configured in a new way: "ifconfig" won't display IP aliases assigned by the cluster; only the newer command "ip addr show" will display them.

 

4.) Exactly.

5.) It depends on many factors.

  • You have a two-node cluster. If you don't have a quorum disk, a reboot-and-fence loop is possible: network fails, node1 fences node2, node2 reboots, establishes a 1-node cluster on its own and fences node1 since it doesn't answer; node 1 reboots, reforms a 1-node cluster on its own and fences node2 ... A quorum disk or a third node would protect the cluster from this issue.
  • If you configure fencing to leave the fenced node in the "off" state, someone must restart it manually after the network failure has been fixed. In other words, the cluster cannot return to normal state automatically if the fault is temporary.
  • In a two-node cluster, if the network fails without causing the NIC links to go down on the nodes (i.e. the fault is not in the switch nearest the node(s), but in a switch or router further away), the surviving node is determined essentially at random: it might be that the node that has good Internet connection gets fenced and the node with a failed Internet connection takes over the service... and then just sits there, wondering why there does not seem to be any clients at all. (In fact, it might even have a slight advantage in the fencing race, since it has no clients to serve.) A quorum disk daemon can be used to add extra tests like "Can the default gateway be successfully pinged from this host?" to determine which node has healthy network connections,so it can help to ensure the failed node gets fenced and the good node is allowed to continue.
  • (any other things depending on the details of your situation...)

6.) Fencing is the equivalent of a hard, yank-off-the-power-cords shutdown, so it is the equivalent of an OS crash by definition. The problem is, when fencing is required, the cluster only knows that the node won't respond normally but has no way to determine the reason: it might be because there is a network failure, or it might be because the OS on that node has crashed or overloaded... fencing must be something that produces an effective result in all situations.

 

7.) If the network fault has been fixed, as soon as node1 reboots it will re-join the cluster. What happens to the clustered service(s) will be determined by the failover domain settings of the respective service(s): if the failover domain has been configured with the "failback" attribute, the service will automatically return to the most-preferred node as soon as it has successfully rejoined the cluster.

 

If your service takes a long time to start up, it might be advisable to not enable automatic failback for that service: if the automatic failback is enabled and there is an intermittent network fault ("works for a few seconds, then fails again"), it might be preferable that the service stays on the secondary node until the sysadmin confirms the fault is truly fixed. Otherwise the service might spend all its time moving back and forth between the nodes, not getting any useful work done.

MK
senthil_kumar_1
Super Advisor

Re: which one is best Quorum disk or Fencing in Redhat Linux

Yes, the IP address of node2 is "10.0.0.2"

 

I have some more questions:


1)Where we have to mention "most-preferred node" in failover domain? or will the first node in failover domain will be considered as most-preferred node? How the failover order for nodes determined in cluster?

 

2)How many services we can configured for one failover domain?

 

3)For say, If we are configuring two services for one failover domain, then those two services will be running from one node only either node1 or node2 based on the active node?, Am I correct?

 

4)I want to configure two node cluster in active-active state, That is, I need to configure the clustering for SAP and Oracle, So SAP needs to run in node1 all the time and Oracle needs to run in node2 all the time by default, When node1 fails the SAP service and oracle service need to run from node2, And if node2 fails, Oracle and SAP service need to run from node1, For this scenario, I need to configure two failover domains "sap" and "oracle" and I have to add both nodes under two failover domains, And I need to assign two different IPs "10.0.0.100" and "10.0.0.200" to two failover domains "sap" and "oracle" respectively. So by default, when cluster activity starts, The IP "10.0.0.100" will be assigned to node1 and "sap" will run from node1.
And The IP "10.0.0.200" will be assigned to node2 and "oracle" will run from node1, Am I correct, Please correct me If I am wrong?


5)You told that, fencing must be something that produces an effective result in all situations, Then why Redhat is insisting on fencing:), Do you know the features of fencing?


6)How the failed node was managed in Redhat clustering configured in RHEL2 and RHEL3 (In Redhat OS where we dont have fencing concept, but we had configured quorum disk)?


7)In your experience, Have faced any OS crash after fencing the node, If yes, Please expaling those incidents and how you solved those incidents?


8)Will there be any crash to application / oracle data base on fenced node as node is abruptly shutdowned / rebooted?

 

9)So as per by default fencing configuration, the fenced node will be rebooted, Am I correct?

 

10)How to configure fencing to shutdown the fenced node instead of reboot, by this Atleast we can avoid "fence war (split brain)" condition, Is it correct / please give your suggestion?


11)when fencing is configured to reboot the server, Will it reboot the server properly (like will all the services be stopped properly) or not and will it stop cluster services properly while reboot?

 

12)Is there any limit, that this many times we can hard off (yank-off) reboot / shutdown the server?


13)As mentioned in my previous reply, There is two node cluster "node1" and "node2" with out Quorum disk, All the clustering activity is happening from node1, suddenly network connection of node1 fails,
So that node2 will fence node1 and starts all cluster activity on node2, now node1 is rebooting (but still network connection is not restored), Now I have some following questions.


13.1)All cluster activity is happening on node2 and cluster services are running from node2, Am I correct?


13.2)node1 can not fence node2 as node1 does not have network connection, Am I correct?


13.3)But node1 will start cluster services automatically once it is up after fencing (rebooting), but node1 can not fence node2 as there is no network connection to node1, Am I correct?


13.4)As there is still no network connection to node1, It can not know that all cluster activites are happening from node2 and I can not fence node2 also, but node1 will start cluster activiy, cluster service and assign the service IP address "10.0.0.10" to node1, but other network devices know that only node2 is using the IP address "10.0.0.10", as node1 is not having network connection even if it is assigned the IP address "10.0.0.10", Am I correct?


13.5)Now the cluster service is running on two nodes and two nodes are having the IP address "10.0.0.10", but the clients and all network devices know that the IP address belongs to node2 but not node1 as still there is no network connection to node1, So that the clients are access the cluster service from node2, Am I correct?


13.6)Now network connection on node1 is restored, now what will happenn as either nodes is hosting cluster service and having the IP address "10.0.0.10"?

 

Matti_Kurkela
Honored Contributor

Re: which one is best Quorum disk or Fencing in Redhat Linux

1.) A failover domain can be:

  • unrestricted or restricted: a service that is configured with a restricted failover domain will only run on those nodes listed in the failover domain configuration. An unrestricted failover domain with a node list will prefer the listed nodes, but can use unlisted nodes if listed nodes are not capable of running the service at the time. (Obviously, this is not very important in two-node clusters.)
  • unordered or ordered: when a service is configured with an ordered failover domain, the ordering of the host in the failover domain configuration will define the order of preference
  • auto-failback or not.

2.) As far as I know, there is no limit.

 

I think "failover domain" is a rather undescriptive name: you cannot know what it actually means without a deeper explanation. Something like "failover ruleset" might be more descriptive: a failover domain is essentially a list of settings that are used to determine which node should be running the service or services associated with that failover domain.

 

3.) There is no such thing as "active node" RedHat cluster, although if you have only one service, you can call the node that is currently running a service the "active node". If you have two services,  and you have assigned them to the same failover domain, the settings of the failover domain will determine whether the services will prefer to be on the same node or not. If the failover domain is ordered and node1 is listed as the first = most-preferred, then both services will run on node 1 unless failed over to node 2; if the failover domain has auto-failback enabled too, then both services will always move to node 1 when it is available.

 

In a two-node cluster, I would typically set up two failover domains and name them "prefer-node1" and "prefer-node2". The first one would have its node list as "node1, node2" and the second as "node2, node1". Now I could select the "normal" running location for each service by associating them with the respective failover domain. (If I had many services and some would benefit from auto-failback and others not, I might have to to configure two more failover domains with different auto-failback settings.)

 

4.) Yes... two ordered failover domains: the failover domain for "sap" should list node1 as the most-preferred node, and the failover domain for "oracle" should list node 2 as the most-preferred node. Both Oracle and Sap usually take a significant time to start up, so I would not enable auto-failback on them without a good reason.

 

5.) I don't think I understand the question.

 

The point is, after node A gets confirmation that node B has been successfully fenced, node A can be 100% sure that node B is no longer claiming any service IP addresses, and is not holding any uncommitted disk writes to shared disks. So node A can safely take over any services that were running on node B. It might have to run a filesystem check before mounting any shared disks, but with journaling filesystems like ext3, this usually means just a journal replay operation which is relatively quick.

 

6.) It might not have been called "fencing", but the requirement was still there. Here's a two-node cluster configuration example from the RHEL 2.1 cluster suite documentation.

Note that it includes serial port connections for remote power switches, so that each node can poweroff the other.

 

7.) Do we understand the word "OS crash" the same way? To me, "OS crash" means any uncontrolled shutdown. It might be caused by a kernel bug, system overload, hardware failure, loss of power or any other thing. So I would say fencing causes an intentional OS crash, by definition.

 

And yes, I've seen this happen. I've even routinely caused it to happen - each time when testing a new cluster set-up before actually putting it to production use. Usually, a system reboots just fine: the filesystem check may require a journal replay, but the system will do it automatically.

 

8.) Of course it is a risk - but the only alternative would be to do nothing. In all failover clusters (RedHat Cluster Suite, HP Serviceguard, etc.) the application must be prepared to deal with the fact that the previous shutdown may have been a crash. For example, when Oracle is properly configured for cluster use, it usually recovers from crash shutdowns just fine. If an application needs some special recovery actions after a crash shutdown, you should write a script that checks if the recovery action is required and does it automatically when necessary, and add this script in the service configuration so that it runs before the actual service application is started.

 

9.) Yes: the default fencing operation is to first power off the node, and then power it back on again so that it will reboot.

 

On clusters with three or more nodes (or on two-node clusters with a quorum disk), the rebooting node will attempt to rejoin the cluster: if the failure is no longer present, the node will successfully communicate with the other nodes and will rejoin the cluster in a controlled way. If the failure still prevents communication, the rebooting node will see it does not have enough votes to achieve cluster quorum, so it won't do anything: it won't try to fence any other nodes, and it certainly won't attempt to start up any services. It just keeps trying to communicate with the other cluster nodes.

 

In a two-node cluster without a quorum disk, this vote logic is disabled: if the rebooting node cannot communicate with the other node, it will start up the cluster on its own, and will try to fence the other node to make sure it won't interfere with cluster operations on this node. So you'll get a reboot-and-fence loop.

 

(In a three-node cluster, each node has 1 vote by default. Since 2 or more is needed to achieve quorum in a three-node cluster, a single isolated node cannot start up a cluster alone. In a two-node cluster with a quorum disk, the result is the same but the method to achieve it is slightly different: each node has 1 vote, and the quorum disk daemon will grant 1 extra vote only if the quorum disk is accessible and does not indicate another node is already running a cluster.)

 

10) When configuring the fencing agent, you add the parameter action="off" to the cluster configuration. In the /etc/cluster/cluster.conf file, it should look like this:

...
<fencedevices>
        <fencedevice agent="fence_ilo" name "your_node1_ilofence" hostname="your_node1_ilo_ip" login="your_ilo_login" password="your_ilo_password" action="off" />
        <fencedevice agent="fence_ilo" name "your_node2_ilofence" hostname="your_node2_ilo_ip" login="your_ilo_login" password="your_ilo_password" action="off" />
</fencedevices> 

 This parameter is described in the fence_ilo man page (run "man fence_ilo" on a system that has RHEL 5 Clustering installed to see it).

 

11.) Fencing does not stop anything properly. It just kills the power. This is because the node that is to be fenced might be hanging, so it might be unable to perform a proper shutdown. (It might also be on fire, and about to burn to ash along with the building it's in, if the fire is the cause for the failure.) The cluster already knows it cannot communicate with the node, so sending any commands for a clean shutdown is just waste of time.

 

Unlike standalone servers, cluster nodes are considered expendable in a certain sense: the node that cannot communicate with others can be sacrificed, so that its services can be failed over to the other node.

 

12) There is no fixed limit: it can happen as many times as the hardware can physically take it without failing.

 

13.1) Yes, correct.

13.2) That depends on your network setup, but if your network is wired correctly for cluster use, then yes.

13.3) If a fencing attempt fails, then a failover cannot go on. In this situation, node 1 sees it can neither communicate with node 2 nor fence it. It just keeps trying to fence node 2 and does no other cluster actions.

 

13.4)

As the state of node2 is unknown to node1 and the fencing attempt has failed, it will not start the services.

Analysis:

  • If node2 is actually down and node 1 will not start services, the situation does not change: not good, not bad.
  • If node2 is actually up and running the services and node1 attempts to start running the services without a successful fencing operation first, there will be duplicate IP addresses in the network and data corruption on shared disks. Very bad!

So, the right choice for node 1 in this situation (no communication with node 2 and no succesful fencing of node 2), the right choice for node 1 is to wait and keep retrying the fence operation.

 

13.5.) Depends on the network set-up, but you're essentially correct.

 

13.6)

Node 1 fences node 2, then starts cluster services on node 1. Node 2 reboots, finds it can now communicate with node 1 and sees node 1 is already running the cluster. Node 2 rejoins the cluster, and node 1 tells node 2 that the cluster services are already running on node 1.

 

Any services configured with auto-failback and node2 as the most-preferred node will now shutdown in controlled fashion on node 1 and restart on node 2.

 

Yes, the extra reboot is ugly. But without a quorum disk, it is not possible to avoid and have the cluster recover 100% automatically. If the fencing agent is configured to halt the system instead of rebooting, the extra reboot is avoided, but node 1 won't start up automatically. If the system administrator is working to fix the network problem, he can also manually shutdown the cluster subsystems on node 1 before restoring the network connectivity, effectively cancelling any pending fencing requests on node 1.

 

 

 

With a quorum disk:

13.3.qd)

Node 1 reboots and starts the quorum disk daemon. It sees node 2 is already running the cluster, so it won't grant an extra vote to node 1.

Node 1 sees it has only 1 cluster quorum vote out of 3, so it can not start a cluster on its own, can not fence any other node, and can not start any cluster services. It keeps listening to the cluster multicast address, waiting to hear from the other nodes.

 

13.4qd) Node 1 has not enough votes: it knows it must stay out from cluster activities.

 

13.5qd) Node 1 has not enabled cluster IP addresses, so there is only one copy of the cluster IP address running. No risk of confusing the clients!

 

13.6qd) Node 1 begins to communicate with node 2, and is told that node 2 is currently running the cluster services. Since there are no incomplete fencing requests pending, node 1 won't fence node 2.

Node 1 rejoins the cluster. Then any cluster services with auto-failback and node 1 as the most-preferred node will be shut down in a safe manner on node 2 and restarted on node 1.

MK
senthil_kumar_1
Super Advisor

Re: which one is best Quorum disk or Fencing in Redhat Linux

Hi Matti,

 

I have some more questions.

 

1)If auto failback is not enabled, How we have to restore the service on preferred node manually (pls explain step by step)?

2)Why you are suggesting for no auto fail back?


3)How much time will be taken to start the cluster service on other node, if one node is fenced (inlcuding fencing time)?


4)Purpose of hardware watchdog timer in Rhel2.1? and In which port hardware watchdog timer should be connected?


5)In RHEL5, Don't we need watchdog timer?


6)In RHEL5, Don't we need to connnected two nodes using cross over cable apart from one cable from each node connected with LAN (network)?


7)I am going to configure  cluster on two blade servers exist on two different enclousers,As I know, each enclouser is having two 10Gig network cable (for redundancy) coming from network switch,
and one network cable connecting ILO of each enclouser?


7.1)Is it correct setup for Blade server?


7.2)How to make redudancy for ILO network connection?


7.3)could you please suggest more redundancy for my setup on every aspects (LAN, ilo and etc (if any) required for cluster, please explain step by step?

Matti_Kurkela
Honored Contributor

Re: which one is best Quorum disk or Fencing in Redhat Linux

1.) Tell the cluster to relocate your service to nodeA:

clusvcadm -m nodeA -r service:yourservice

 This command will automatically stop the service on the node it's currently running on, and then restart it on nodeA.

 

2.) Each failover and failback means stopping and restarting the service. (In the case of failover, the stop may or may not be gentle.)

If your application starts up very fast (in a few seconds or so), auto-failback is useful. But a large Oracle database or SAP environment may take several minutes to completely start up.

 

For example, let's assume you have a Oracle database that requires 2 minutes to start up and 30 seconds to shut down in a gentle fashion, and a two-node cluster, and the database has failed over from node A to node B because of a network issue. Someone is working to fix the network issue, and causes the network come up for node A for a short time. But the fix is not good, and the network fails again on node A.

 

If you have auto-failback enabled, as soon as the network comes back up for node A, the automatic failback will start. The system will shutdown the database on node B and restart it on node A. Just about as the start-up on node A is complete, the network fails again, and node A gets fenced and rebooted. Again it will take at least 2 more minutes before the database is running on node B. In total, you have about 4.5 minutes time while your database is not available - for no good reason.

 

If you have auto-failback disabled, the service stays on node B until commanded to move back to node A, or until node B fails. You will have the option of testing the network on node A to make sure it's really good before you take the 2.5 minute database outage to failback the database to node A. If the database is working just fine on node B, you can even let it run on node B for a few hours or days, and start the failback in a time where it causes the least amount of disruption to your users.

 

3.) It depends on the configured cluster timeout values and the start-up time of the service. Too short cluster timeouts may cause "false alarms": the cluster services might failover when it's not actually necessary. Longer timeouts will slow down the "reaction time" of your cluster.

 

With Oracle or SAP clusters, I'd expect the Oracle/SAP start-up time to be the dominating factor in the time required for failover.

 

4.)Typically, a hardware watchdog timer is built-in to the server mainboard, although there are add-on PCI cards or USB devices that provide hardware watchdog functionality. In general, a hardware watchdog timer will cause the system to reboot if the OS becomes hung for more than a specified time.

 

In a cluster context, the purpose of the watchdog is to ensure that, when a node is hung and other node(s) assume it has failed, it will not attempt to write to shared disks without first discarding all its disk caches. This is accomplished by forcing a reboot.

 

5.) It's not listed as necessary; however, if your hardware includes a watchdog timer, it may be useful. (If you use Proliant hardware and install the Proliant Support Pack, it already includes a hardware watchdog system.)

 

6.) You should use bonding and at least two LAN cables from each node. In RHEL 5, the cluster subsystem uses IP multicasts for cluster heartbeats and other communication; I don't know an easy way to route IP multicasts through two separate network segments (the production network is one, and the crossover cable works as another, very small, network segment).

 

7.1) If you have only one switch, that switch will be a Single Point of Failure: a point where a single hardware failure can make the entire cluster unable to serve your clients. If you cannot have two switches, make sure the switch is of good quality, with dual power supplies and hot-swappable fans (if applicable). Otherwise, see below.

 

7.2) If your blade chassis model is c7000, you can install two OAs in it. Each blade has an independent iLO which manages the blade itself, while the OA manages the chassis, the I/O modules, fans and power supplies and ensures the chassis hardware configuration is valid. The iLO network connection goes through the OAs, so having two OAs should help. If you have two switches (linked together of course), wire one OA from each chassis to one switch, and the other OA to the other switch.

 

7.3) You said your cluster is going to be for running Oracle and SAP, so some kind of shared storage is probably required. If you use SAN storage, make sure you have dual SAN connections too.

 

Here's a link to RedHat Knowledge Base document that describes "best practices" for deploying RedHat Clusters: https://access.redhat.com/kb/docs/DOC-40821

MK
brucezee
New Member

Re: which one is best Quorum disk or Fencing in Redhat Linux

So, based on what you saying here, GFS system can't solve the split-brain problem? 

Can we use the distrubuted locker manager (DLM) to control the writing and reading process you mentioned?

Is possible to save the cache in other place?

Then how could I solve the split brain situation? 

root4sp
New Member

Re: which one is best Quorum disk or Fencing in Redhat Linux

Thank for your valuable information. I have question what is the difference between delay and post_join_delay ? both are same ?