Redhat cluster start cman fence the other node

Occasional Contributor

Redhat cluster start cman fence the other node



I know this not the first post about Redhat Cluster Suite fencing.

BUT I read and try alot alot before start this post.

I have Redhat 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

I'm trying to configure Redhat Cluster Active/Passive for two nodes.

I configur the cluster and run the services on oradb1, I halt oradb1 and the services start on oradb2 normally.

BUT now when any server start the cman service it will kill the other node


I appriciate if some one can help


- two ProLiant DL380 G7 ( ilo3 )

- two mount point /u01 /backup ( lvm )

- nodes oradb1, oradb2 

- iptables off 2345

- cman on 2345

- rgmanager on 2345

- qdiskd on 2345

- acpi off 2345

- ipmi on 2345

- # rpm -qa | grep cman


# mkqdisk -L -v
mkqdisk v0.6.0
        Magic:                eb7a62c2
        Label:                qdisk
        Created:              Sun Nov 27 12:56:32 2011
        Host:                 ORADB1
        Kernel Sector Size:   512
        Recorded Sector Size: 512


# ccs_tool lsnode

Cluster name: ora_cluster, config_version: 32

Nodename                        Votes Nodeid Fencetype
oradb1                             1    1    db1ilo
oradb2                             1    2    db2ilo


[root@oradb1 ~]# clustat -l
Cluster Status for ora_cluster @ Mon Nov 28 18:10:52 2011
Member Status: Quorate

 Member Name                                                   ID   Status
 ------ ----                                                   ---- ------
 oradb1                                                            1 Online, Local, rgmanager
 oradb2                                                            2 Offline  
 /dev/dm-4                                                         0 Online, Quorum Disk

Service Information
------- -----------

Service Name      : service:orasrv
  Current State   : started (112)
  Flags           : none (0)
  Owner           : oradb1
  Last Owner      : none
  Last Transition : Mon Nov 28 18:04:44 2011


This is my configuration


<?xml version="1.0"?>
<cluster alias="ora_cluster" config_version="32" name="ora_cluster">

<quorumd interval="3" label="qdisk" min_score="1" tko="9" votes="1">
  <heuristic interval="3" program="ping -c1 -t1" score="1"/>
 <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
  <clusternode name="oradb1" nodeid="1" votes="1">
    <method name="1">
     <device lanplus="1" name="db1ilo"/>
  <clusternode name="oradb2" nodeid="2" votes="1">
    <method name="1">
     <device lanplus="1" name="db2ilo"/>
 <cman expected_votes="3"/>
  <fencedevice agent="fence_ipmilan" ipaddr="" lanplus="1" login="admin" name="db1ilo" passwd="admin123"/>
  <fencedevice agent="fence_ipmilan" ipaddr="" lanplus="1" login="admin" name="db2ilo" passwd="admin123"/>
 <rm log_facility="local4" log_level="7">
   <failoverdomain name="oradb" ordered="1" restricted="0">
    <failoverdomainnode name="oradb1" priority="1"/>
    <failoverdomainnode name="oradb2" priority="2"/>
   <ip address="" monitor_link="1"/>
  <service autostart="1" domain="oradb" name="orasrv" recovery="relocate">
   <fs device="/dev/mapper/vg02-lvol1" force_fsck="0" force_unmount="0" fsid="64360" fstype="ext3" mountpoint="/backup" name="orabackup" options="" self_fence="0"/>
   <fs device="/dev/mapper/vg01-lvol1" force_fsck="0" force_unmount="0" fsid="49801" fstype="ext3" mountpoint="/u01" name="orahome" options="" self_fence="0"/>
   <ip ref=""/>






Reiner  Rottmann
Frequent Advisor

Re: Redhat cluster start cman fence the other node

Are there any cluster related log entries in /var/log/messages on both machines?


You should consider enabling cman debug messages:


<logging to_file="yes" logfile="/tmp/openais.log" fileline="on">
   <logger ident="CMAN" debug="on"/>

The qdisk is shared storage, right?

Honored Contributor

Re: Redhat cluster start cman fence the other node

I've seen cases where RedHat Cluster can start when the Conga configuration tool starts all nodes simultaneously  when setting up the cluster, but when a node leaves the cluster, it will fail to rejoin and instead fences the other node. As the other node reboots after being fenced, it will try to rejoin the cluster... and if it fails too, you will get a see-saw effect (each node reboots and then fences the other node in turn).


In the cases I've seen, this was caused by problems in IP multicast propagation.


The implementation of IGMP snooping in a large group of Cisco switches has a known issue: if there is no multicast router or IGMP querier (or any other source of IGMP queries) in the network segment, the switches will fail to detect systems attempting to rejoin existing multicast groups.


This Cisco Tech Note includes 5 ways to address this issue: