Operating System - Linux
1752862 Members
3730 Online
108791 Solutions
New Discussion

Redhat 6.1 Cluster Active/Active power off

 
arunbasu
Occasional Contributor

Redhat 6.1 Cluster Active/Active power off

Hi team,

 

I have a two node  Active/Active cluster with redhat 6.1 OS., with HP proliant DL785 G6 server and HP ILO fence device.

 

I have noticed my two servers is in power down state. Could you guide me plesae  what is the issue behind of this.

 

is fence device issue?Very urgent

 

Arun Basu

1 REPLY 1
Matti_Kurkela
Honored Contributor

Re: Redhat 6.1 Cluster Active/Active power off

Since you have provided only very few details of your configuration, I can only guess. My guess is that this is a textbook case of "fence death" in a two-node cluster.

 

Explanation:

The root cause for this might be a failure in your heartbeat network. When the cluster heartbeat stops working, both nodes in a two-node RedHat cluster will attempt to fence the other node. The node that first successfully fences the other one may continue cluster operations. This is called a "fence race". Clusters with more than two nodes (or two nodes and a quorum disk) will use voting logic instead of a fence race.

 

RedHat Cluster is built to assume that only one fencing operation may succeed at a time. The RedHat example configurations use a common remote-controllable PDU or a similar device for power fencing. If such a device acceptsonly one power switch command at a time, it will satisfy the design assumption of the RedHat Cluster software.

 

Using iLOs of each node as fence devices does not satisfy the design assumption, since each iLO can be accessed separately and thus both nodes can simultaneously power off each other, if the fence race ends up being a tie. The resulting situation is known as "fence death" - both nodes will simultaneously power off each other.

 

Solutions:

You must either eliminate the fence race or ensure that the fence race will always have one clear winner (no ties).

 

If it's OK that a specific node will always win a fence race, you could add a delay parameter to the fencedevice definition for that node. Please see:

https://access.redhat.com/knowledge/solutions/54829

 

Adding a third node or a quorum disk would allow the cluster to use the quorum voting logic: if a 3-node cluster splits into two parts of 2 + 1 nodes because of a network failure, the 2-node fragment will still have quorum and will attempt to fence the single node. The single node will recognize that it has lost quorum and will passively wait for fencing. The quorum disk serves the same purpose, providing extra quorum vote(s) to only one cluster fragment that is healthy according to any configured qdiskd heuristics tests. In such a configuration, iLOs are an appropriate fencing solution.

 

If you cannot add a quorum disk or a third node and adding a fence delay value is not an appropriate solution for you, you might need to re-think your fencing solutions.

MK