System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

One node always be fenced after startup when it was been reboot (shutdown -ry 0)

SOLVED
Go to solution
louisji2008
Frequent Advisor

One node always be fenced after startup when it was been reboot (shutdown -ry 0)

Dear expert,

 

I have a two-nodes cluster without quorum disk.

 

When I configured cman and rgmanager service auto-start and the cluster is running well, I run "shutdown -ry 0" command on the node running the service.

Then the service can failover to the other node.

But when the first node startup, it is fenced. so it is reboot. when it startup, it is fenced again. So it got to a loop.

 

 

 

[root@node1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="24" name="db">
        <clusternodes>
                <clusternode name="node1" nodeid="1">
                        <fence>
                                <method name="node1-fence">
                                        <device name="node1-fence"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node2" nodeid="2">
                        <fence>
                                <method name="node2-fence">
                                        <device name="node2-fence"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" ipaddr="node1ilo" lanplus="on" login="Administrator" name="node1-fence" passwd="as1a1nf0"/>
                <fencedevice agent="fence_ipmilan" ipaddr="node2ilo" lanplus="on" login="Administrator" name="node2-fence" passwd="as1a1nf0"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="oracle">
                                <failoverdomainnode name="node1" priority="1"/>
                                <failoverdomainnode name="node2" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <lvm lv_name="lvol1" name="vg_db" vg_name="vg_db"/>
                        <fs device="/dev/vg_db/lvol1" force_fsck="0" force_unmount="0" fsid="62307" mountpoint="/data" name="data" self_fence="0"/>
                        <ip address="192.168.10.20" monitor_link="1" sleeptime="10"/>
                        <script file="/etc/init.d/db.sh" name="script_db"/>
                </resources>
                <service autostart="1" domain="oracle" name="oracle" recovery="relocate">
                        <ip ref="192.168.10.20">
                                <lvm ref="vg_db">
                                        <fs ref="data">
                                                <script ref="script_db"/>
                                        </fs>
                                </lvm>
                        </ip>
                </service>
        </rm>
        <fence_daemon post_fail_delay="10" post_join_delay="300"/>
</cluster>

 

How time flies~~~
2 REPLIES
louisji2008
Frequent Advisor

Re: One node always be fenced after startup when it was been reboot (shutdown -ry 0)

I found the heartbeat communication between the two nodes  is very bad, is this the root cause?

 

Now I add  <totem token="60000"/> into the cluster.conf file, and the node has not been fenced.

 

 

[root@xsl-rms-database1 ~]# tcpdump -i eth2 host 239.192.145.113
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
17:35:34.631409 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:35:46.059042 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:35:57.487167 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:36:08.915044 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:36:20.342966 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:36:31.770601 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:36:43.198173 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:36:54.626517 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:37:06.054125 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:37:17.481742 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:37:28.909577 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:37:40.337407 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:37:51.765033 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:38:02.840907 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:02.841297 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 409
17:38:02.841773 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 225
17:38:02.842385 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 225
17:38:02.842727 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 225
17:38:03.052251 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:38:14.486827 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:38:24.587642 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:24.588271 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 409
17:38:24.588677 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 225
17:38:24.588971 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 225
17:38:24.589276 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 225
17:38:24.611727 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:24.611963 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 305
17:38:24.612547 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 225
17:38:24.613513 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 318
17:38:24.614384 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 410
17:38:24.813329 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:24.813629 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 409
17:38:24.813974 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 225
17:38:24.814254 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 225
17:38:24.814385 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 225
17:38:25.015455 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:38:26.816852 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:26.817204 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 171
17:38:27.027382 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:38:38.461505 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:38:41.757652 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:41.758521 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 409
17:38:41.759030 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 225
17:38:41.759272 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 225
17:38:41.798073 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:41.798319 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 305
17:38:42.008687 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:38:49.957300 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:49.957776 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 318
17:38:49.958556 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 410
17:38:49.962342 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 171
17:38:50.172244 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:38:50.738525 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:50.739035 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 318
17:38:50.739407 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 486
17:38:50.741500 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 486
17:38:50.951407 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:38:51.795250 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:51.798245 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 493
17:38:52.016073 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:38:52.319380 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:52.322057 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 493
17:38:52.324214 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 765
17:38:52.533662 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:38:52.822723 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:52.824784 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 493
17:38:53.034707 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:38:53.358578 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:38:53.358859 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 173
17:38:53.362947 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 193
17:38:53.363007 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 237
17:38:53.576170 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:39:05.005301 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:39:16.433318 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:39:27.861204 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:39:39.288462 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:39:50.715988 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:40:02.144176 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:40:13.572024 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:40:24.999620 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:40:36.427201 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:40:47.855210 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:40:59.283356 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:41:10.710816 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:41:22.129658 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:41:33.559540 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:41:44.987399 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:41:56.414766 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:42:07.842229 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:42:19.269947 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:42:30.697412 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:42:42.125144 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:42:53.552967 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:43:04.980178 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:43:16.202314 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 149
17:43:16.202440 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 193
17:43:16.271570 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 193
17:43:16.271761 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 193
17:43:16.275447 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 352
17:43:16.276730 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 352
17:43:16.277089 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 542
17:43:16.277317 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 542
17:43:16.277487 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 201
17:43:16.277674 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 201
17:43:16.278265 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 201
17:43:16.278389 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 228
17:43:16.278590 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 202
17:43:16.278717 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 542
17:43:16.278931 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 542
17:43:16.279114 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1473
17:43:16.279128 IP node1 > 239.192.145.113: udp
17:43:16.279159 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 315
17:43:16.279502 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1473
17:43:16.279515 IP node1 > 239.192.145.113: udp
17:43:16.279557 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1473
17:43:16.279561 IP node1 > 239.192.145.113: udp
17:43:16.279598 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1473
17:43:16.279603 IP node1 > 239.192.145.113: udp
17:43:16.279639 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1473
17:43:16.279643 IP node1 > 239.192.145.113: udp
17:43:16.279679 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1473
17:43:16.279686 IP node1 > 239.192.145.113: udp
17:43:16.279724 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1473
17:43:16.279728 IP node1 > 239.192.145.113: udp
17:43:16.279765 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1473
17:43:16.279769 IP node1 > 239.192.145.113: udp
17:43:16.279807 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1473
17:43:16.279810 IP node1 > 239.192.145.113: udp
17:43:16.279846 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1473
17:43:16.279850 IP node1 > 239.192.145.113: udp
17:43:16.279884 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1133
17:43:16.280327 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 201
17:43:16.280515 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 201
17:43:16.280797 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 174
17:43:16.280912 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 174
17:43:16.281224 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 174
17:43:16.281400 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 174
17:43:16.281894 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 201
17:43:16.282035 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 201
17:43:16.282346 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1473
17:43:16.282359 IP node1 > 239.192.145.113: udp
17:43:16.282398 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1285
17:43:16.282675 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 1473
17:43:16.282697 IP node2 > 239.192.145.113: udp
17:43:16.282720 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 390
17:43:16.282882 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 201
17:43:16.283206 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 201
17:43:16.283509 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 723
17:43:16.283740 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 723
17:43:16.494046 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:43:20.495765 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:43:20.496097 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 318
17:43:20.496441 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 390
17:43:20.496589 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 390
17:43:20.497044 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 390
17:43:20.707062 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:43:20.826634 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:43:20.826885 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 318
17:43:20.827256 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 406
17:43:20.827433 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 406
17:43:20.827870 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 406
17:43:21.038175 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:43:21.980353 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:43:21.981444 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 318
17:43:21.982645 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 406
17:43:21.982818 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 406
17:43:21.983312 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 406
17:43:22.042346 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
17:43:22.042622 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 318
17:43:22.043116 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 486
17:43:22.043361 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 486
17:43:22.043770 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 486
17:43:22.253335 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119

 

How time flies~~~
louisji2008
Frequent Advisor
Solution

Re: One node always be fenced after startup when it was been reboot (shutdown -ry 0)

Finally, I find the root cause.

 

it's the rc.local.

 

The network admin configured the bond0 and modify rc.local like this:

[root@xsl-rms-database1 ~]# cat /etc/rc.local
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.

 

/etc/init.d/network restart
touch /var/lock/subsys/local

 

And the rc.local service is executed after cman service and before rgmanager.

# ll /etc/rc*d/S*

...

lrwxrwxrwx  1 root root 14 Aug  2 14:49 /etc/rc5.d/S21cman -> ../init.d/cman

...

lrwxrwxrwx. 1 root root 11 Mar 19 15:45 rc5.d/S99local -> ../rc.local
lrwxrwxrwx. 1 root root 14 Jul 11 08:31 rc5.d/S99luci -> ../init.d/luci
lrwxrwxrwx. 1 root root 19 Jul 11 10:06 rc5.d/S99rgmanager -> ../init.d/rgmanager

'''

 

So the heartbeat is down when the rc.local is executed then the node is fenced.

 

Solution:

1. add totem token option of cman parameter in cluster.conf file. Like this: <totem token="60000"/>. The number is 60seconds, it should be longer than the period of the hearbeat network down.

 

2.Disable cluster service autostart: chkconfig cman off;chkconfig rgmanager off;chkconfig clvmd off ...

3.(need test) remove "/etc/init.d/network restart" in the rc.local file.

How time flies~~~