Re: Fencing Issue...RedHat Cluster Suite

S.M.Athar · ‎08-25-2010

For fencing, we are using HP iLO and server is BL460c G6. Problem is resource is start moving to the passive when the failed node is power on. It is really strange for me. For example, I shutdown the node1 and physically remove the node1 machine from the blade chassis and monitor the clustat output, clustat was still showing that the resource is on node 1, even node 1 is power down and removed from c7000 blade chassis. But when I plugged again the failed node1 on the c7000 blade chassis and it power-on, then clustat is showing that the resource is start moving to the passive node from the failed node.
Please help me.
Regards
Athar

Matti_Kurkela · ‎08-25-2010

Sounds like your cluster service definition includes a failover domain with these properties:
- ordered (the failover domain has been configured prefer node1 over others)
- failback enabled

In this case, the cluster will try to return the service (and all its resources) to node1 as soon as it joins the cluster again. The cluster is simply doing what it's configured to do. If this automatic failback is not desirable, disable it.

If you use Conga to configure your cluster, check the checkbox labelled "Do not fail back services in this domain" in the failover domain configuration.

If you'd rather edit the XML configuration manually, the attribute is 'nofailback="1"'. It should be added to the failoverdomain tag:

...

...

...

...

...

MK

MK

S.M.Athar · ‎08-25-2010

Thanks for your prompt reply.
Autofailback is disabled already. Sir, Cluster resource is failover to the passive node when my the active node is just power on, I mean on "POST". I think there is some issue with my ilo fencing configuration.
When I Power-off the active and remove is from chassis, its ilo also not online, in this situation resource still shwoing on the active node, but when I power on the again the active node, then the resource start failing over to the passive node.
I am attaching my cluster.conf.

Michael Leu · ‎08-25-2010

Please see macosta's reply in your other thread:
http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1444969

Matti_Kurkela · ‎08-26-2010

Sorry, I totally misread your problem description. (I had a bit of flu so my brain probably wasn't working at 100%...)

If you physically remove the node1 blade, the cluster will notice the blade will no longer be sending heartbeats, and will attempt to fence it.

The important point is: once fencing is started, cluster operations will continue normally *only after the cluster has received confirmation of successful fencing*.

But when the blade is physically disconnected, the other node(s) won't be able to reach the iLO of the disconnected blade, so the fencing attempt will fail.

Now, node1 is unresponsive, so its status is unknown to the other nodes. Is node1 dead, or is it just on the other end of a bunch of cables destroyed by a server rack tipping over? The cluster has no way of knowing.

Because the attempt to fence node1 failed, the other node can only wait and see: "well, if node1 is alive, it will fence _us_, and then the situation will be resolved. Or perhaps node1 is rebooting and will soon be rejoining the cluster, and all will be well again."

By physically disconnecting the blade, you'll simultaneously cause multiple failures:
- multiple network connection failures
- fencing connection failure
- storage connection failure
This is more failures than RedHat Cluster can deal with.

Do you have a quorum disk? If you don't, and you have only 2 nodes in your cluster, your cluster may become inquorate after you unplug the node1 blade. An inquorate segment of a cluster may not run any services, and it may not make any fencing decisions either.

In a RedHat Cluster, a two-node cluster is a very tricky special case. If the cluster configuration sets the special "two_node" parameter to 1, quorum check is essentially overridden. But the fundamental rule is still the same: if a node vanishes, the remaining node must *successfully* fence the vanished node before failovers or other cluster processing may continue.

MK

MK

Michael Leu · ‎08-26-2010

Matti, if I may ask, what is your take on the linux fencing methods vs ServiceGuard with a quorum server?

I always thought how ServiceGuard did it was the right way: reset yourself if you are alone. Is this perhaps misguided? Or can such behaviour be 'emulated' with the linux clusters on RHEL/SLES?

S.M.Athar · ‎08-26-2010

Matti Thanks for your comprehensive reply .

Yes I am using Quorum disk and there is no issue with failover and the resource relocate the other node when I rebooted or shutdown the node. But when I poweroff the node and remove it from the chassis then I am facing the issue regarding the failover.
Please guide me, For e.g If node goes down due the hardware problem probably the motherboard then What is the status of iLO? Is iLO still alive? Fencing will work?
Please help me or share some HP document.
Regards
Athar

S.M.Athar · ‎08-26-2010

Sorry, In addition of my last post, I am using 2 node cluster and also using Quorum disk, So it is neccesary for me to use the special parameter "two_node 1" ?
Regards
Athar

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Fencing Issue...RedHat Cluster Suite

Fencing Issue...RedHat Cluster Suite