Operating System - Linux
1832310 Members
2604 Online
110041 Solutions
New Discussion

Re: RH Cluster Suite - Fence manual does not work

 
Ivan Ferreira
Honored Contributor

RH Cluster Suite - Fence manual does not work

I have created a 2 nodes cluster with red hat cluster suite and RHEL ES 4.

I configured the fencing method as manual. The problem is that when I power off one node to test it, the fencing does not work, on the log I get:

fenced: fencing node "nodename"
fenced: fence "nodename" failed

This will show forerer and the cluster will hang, until the other node join the cluster again.

If I run fence_manual I get:
sucess: fence_manual "nodename"

In the log I get:

Waiting for "nodename" to rejoing the cluster or for manual acknowledment that it has been reset (i.e. fence_ack_manual -n "nodename")

If I run fence_ack_manual -n "nodename" I get:

can't open /tmp/fence_manual.fif: No such file or directory.

If I do strace of fence_manual, I see:

mknod("/tmp/fence_manual.fifo", S_IFIFO|0600) = 0
write (1, "sucess: ....) = 41
unlink("/tmp/fence_manual.fifo") = 0

Why I get the unlink? Is this removing the file before I run fence_ack_manual?

I'm currently just testing, I know that I should use another fencing method, but it would be nice if I could just have this working to do all other testings.

If I stop the fenced daemon (even if I souldn't), the services will relocate, because it won't try to fence the node (and then hang the cluster), but GFS won't work.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
7 REPLIES 7
Ivan Ferreira
Honored Contributor

Re: RH Cluster Suite - Fence manual does not work

Ok, now I understand. fence_manual only works when the other node is down. If you do fence_manual with the other node up, will will finish immediatly because the cluster is quorate. But, still does not works directly from fenced when a node is powered off.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Ivan Ferreira
Honored Contributor

Re: RH Cluster Suite - Fence manual does not work

Now it works. It was a configuration problem. I created the fence device but did not assiciated with a node. Also, system-config-cluster does not allows to specify all parameters for the fence method. The nodename must be added manually to the cluster.conf file.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Arnd Kohlen
Advisor

Re: RH Cluster Suite - Fence manual does not work

Hi,

I have the same problem here, but wasn't yet able to keep the whole cluster from blocking if one of the nodes crashes.

While powering off one of the nodes over ILO the whole gfs blocks any access. Trying to fence that host afterwards with fence_manual or fence_ack_manual I run in the same error messages you mentioned before.

You closed this case but hopefully you might have an idea, how I can solve this...

I attached the configuration file to this reply. Please take a short look at it.
Arnd Kohlen
Advisor

Re: RH Cluster Suite - Fence manual does not work

Hi,

I have the same problem here, but wasn't yet able to keep the whole cluster from blocking if one of the nodes crashes.

While powering off one of the nodes over ILO the whole gfs blocks any access. Trying to fence that host afterwards with fence_manual or fence_ack_manual I run in the same error messages you mentioned before.

You closed this case but hopefully you might have an idea, how I can solve this...

I attached the configuration file to this post. Please take a short look at it.
Ivan Ferreira
Honored Contributor

Re: RH Cluster Suite - Fence manual does not work

GFS will hang because the fencing is not working. Your configuration seems to be good, it's hard to troubleshoot this problem if i'm not in front of the server.

Ensure that your /etc/hosts file is right, and the node name should point to the interconnect ip address.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Arnd Kohlen
Advisor

Re: RH Cluster Suite - Fence manual does not work

For the sake of completeness.

This error "can't open /tmp/fence_manual.fifo: No such file or directory." appears if fence_ack_manual is run without having run "fence_ack" before on the same node. The command fence_manual creates this file and waits for fence_ack_manual.

My problem was my cluster.conf:

Wrong:
Right:

Just if someone finds this thread and needs to know how the story finished... :-)
Steven E. Protter
Exalted Contributor

Re: RH Cluster Suite - Fence manual does not work

Ivan,

I wrote a monitor script for RH4 CL that handles this problem with a fence acknowledge command.

Works well.

Good Luck.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com