Operating System - Linux
1753556 Members
5835 Online
108796 Solutions
New Discussion юеВ

Red Hat Cluster Suite 5.4 and iLO Fence with Blades

 
cerosunos
Occasional Contributor

Red Hat Cluster Suite 5.4 and iLO Fence with Blades

Hi all,



We have two cluster one over two BL680 G5 and another one over two BL460 G6... we notice these problem:



- If we power off a cluster node by shutdown -r command the server stops and then reconnect.

- If we power off a cluster node by press and hold the button.... some minutes after the cluster momentary press by fence the node and then it reconnects.

- If we turn off the service network interfaces, the packages switches correctly then the cluster fence the isolated node but with a momentary press or power off options... so the node didn't starts until a manual power on trough the OA interface



I found this:

We have this iLO2 firmware version: 1.81 01/15/2010

And Red Hat in the fence_ilo scripts puts this:

## The Following Agent Has Been Tested On:
##
## iLO Version
## +---------------------------------------------+
## iLO / firmware 1.91 / RIBCL 2.22
## iLO2 / firmware 1.22 / RIBCL 2.22
## iLO2 / firmware 1.50 / RIBCL 2.22





Another question: We have configured two virtual IP address: One for service and another for Backup..... Service one must be critical so we monitor it, but backup one not, so we use:




Well now at the cluster test jurney we notice that if I made a ifdown ethX which is the one for the Backup IP 172.18.172.74, the cluster package switch to the other cluster node. But I put monitor_link=0



Did anybody found a similar problem? Do you know the solution? Thanks a lot, we continue investigating.



Thanks and regards,

Pablo.
6 REPLIES 6
Serviceguard for Linux
Honored Contributor

Re: Red Hat Cluster Suite 5.4 and iLO Fence with Blades

When other nodes in the cluster lose contact with the node you are taking down, they will use the fence mechanism. The ILO fence mechanism tries to restart the server. Check out http://forums13.itrc.hp.com/service/forums/questionanswer.do?threadId=1410498 on how to stop clustering on a node before halting the node.

I have less experience on the networking side but I'll take a guess on the VIP question: Not sure what you mean by a "Backup" in this context. If you don't want a service to failover of the network fails, then the network should not be listed as a resource.
Joseph L. Casale
Regular Advisor

Re: Red Hat Cluster Suite 5.4 and iLO Fence with Blades

You need to understand how to operate the cluster software, when you do things outside the scope of the cluster, it cant know what you really want.

Leave the fence domain first.

See the bottom of this page:
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Cluster_Administration/ch-mgmt-scc-CA.html

This will correctly allow the node to leave the fence domain and no longer be fenced when it drops out of site, as the cluster software thinks its gone awry and must protect your storage.

As a note, I haven't used that agent but it may be possible to override what is probably the default behavior when none is specified, and instead of restart, simply kill and leave off.
cerosunos
Occasional Contributor

Re: Red Hat Cluster Suite 5.4 and iLO Fence with Blades

Hi,

Well in the Linux HP PDL someone told me:

The default behavior of the cluster is always a decision by the system administrator. RH took the ├в safe├в approach by just having the fenced node turn off.



If you make the edit I suggested below, the server will always cold boot (no orderly shutdown just turn off power then power back on) when the fence_ilo script is called.

The fence_ilo script can be changed to whatever behavior the customer prefers. RH took the position if they had to shut a node down that it should stay down until some administrator made the decision is was ok to bring that node back into the cluster.



The most comment edit is the following:

Original:

< conn.send("\r\n")

Modified to:

> conn.send("\r\n")

I tryied this solution and worked well for me, the nodes restart, not put off and stay off.
Now I'm waitting to the official RedHat reply because modifying an official RHCS script couldn't be supported.

Any sugesetion? Thanks for the replyies.
Steven E. Protter
Exalted Contributor

Re: Red Hat Cluster Suite 5.4 and iLO Fence with Blades

Shalom Pablo,

The ilo firmware needs update, that I am sure of.

ilo is a good fence device in a blade. Make sure the ip address is accessible to all systems. I'd bring the firmware up to date and re-test after making sure network connectivity is normal.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Joseph L. Casale
Regular Advisor

Re: Red Hat Cluster Suite 5.4 and iLO Fence with Blades

Pablo,
You ignored my reply and reposted in Linux-Cluster without results.

Again, you CANT shutdown a node with `shutdown -r`. The remaining nodes think it has gone awry and try to fence it. It restarts as you have that behavior configured. Read your fence agent script and cluster.conf and deduce what you are telling it, if nothing than it is performing a default action which is "reboot" which is why you see it restart.
ibosco
New Member

Re: Red Hat Cluster Suite 5.4 and iLO Fence with Blades

Can you please tell me how to check/confirm if ilo ip is accessible???