1752721 Members
6830 Online
108789 Solutions
New Discussion

Quorum dissolved!!

 
rhel6_cluster
Advisor

Quorum dissolved!!

I have 2 nodes in my cluster, running in RHEL6.3. Both nodes were in cluster but when i check the shell i saw following error:

 

Message from syslogd@PDC-PIC-PL-02 at Sep  5 15:26:44 ...

 rgmanager[32279]: #1: Quorum Dissolved

 

Here is my Cluster.conf

 

[root@PDC-PIC-PL-02 opt]# less /etc/cluster/cluster.conf

                                </method>

                        </fence>

                </clusternode>

                <clusternode name="PDC-PIC-PL-02" nodeid="2" votes="1">

                        <fence>

                                <method name="Fencing">

                                        <device name="PDC-PIC-FN"/>

                                </method>

                        </fence>

                </clusternode>

        </clusternodes>

        <cman expected_votes="3"/>

        <fencedevices>

                <fencedevice agent="fence_ipmilan" auth="password" ipaddr="192.168.25.11" login="hpadmin" name="PDC-PIC-FN" passwd="hpinvent"/>

        </fencedevices>

        <rm>

                <failoverdomains/>

                <resources/>

        </rm>

        <totem/>

        <quorumd label="PDC-PIC-CL"/>

</cluster>"

4 REPLIES 4
rhel6_cluster
Advisor

Re: Quorum dissolved!!

Help me pleaseee!

 

I found following log:

 

Sep 05 15:26:44 corosync [CMAN  ] quorum lost, blocking activity
Sep 05 15:26:44 corosync [QUORUM] This node is within the non-primary component and will NOT provide any services.
Sep 05 15:26:44 corosync [QUORUM] Members[1]: 2
Sep 05 15:26:44 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 05 15:26:44 corosync [CPG   ] downlist received left_list: 1
Sep 05 15:26:44 corosync [CPG   ] chosen downlist from node r(0) ip(192.168.24.33)
Sep 05 15:26:44 corosync [MAIN  ] Completed service synchronization, ready to provide service.
Sep 05 15:26:46 corosync [CMAN  ] quorum device re-registered
Sep 05 15:26:46 corosync [CMAN  ] quorum regained, resuming activity
Sep 05 15:26:46 corosync [QUORUM] This node is within the primary component and will provide service.
Sep 05 15:26:46 corosync [QUORUM] Members[1]: 2
Sep 05 15:30:33 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 05 15:30:33 corosync [QUORUM] Members[2]: 1 2
Sep 05 15:30:33 corosync [QUORUM] Members[2]: 1 2
Sep 05 15:30:33 corosync [CPG   ] downlist received left_list: 0
Sep 05 15:30:33 corosync [CPG   ] downlist received left_list: 0
Sep 05 15:30:33 corosync [CPG   ] chosen downlist from node r(0) ip(192.168.24.32)
Sep 05 15:30:33 corosync [MAIN  ] Completed service synchronization, ready to provide service.

 

And my detail cluster.conf :


<?xml version="1.0"?>
<cluster config_version="23" name="PDC-PIC-CL">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="PDC-PIC-PL-01" nodeid="1" votes="1">
                        <fence>
                                <method name="Fencing">
                                        <device name="PDC-PIC-FN"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="PDC-PIC-PL-02" nodeid="2" votes="1">
                        <fence>
                                <method name="Fencing">
                                        <device name="PDC-PIC-FN"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="3"/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" auth="password" ipaddr="192.168.25.11" login="hpadmin" name="PDC-PIC-FN" passwd="hpinvent"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
        <totem/>
        <quorumd label="PDC-PIC-CL"/>
</cluster>



Matti_Kurkela
Honored Contributor

Re: Quorum dissolved!!

Your copy/paste did not produce a complete cluster.conf: the beginning part with the opening <cluster> tag is missing. (Trying to scroll backwards with the terminal emulator's scroll bar when the terminal display is controlled by "less" or other full-screen program can sometimes cause this result.)

 

"Quorum dissolved" probably means that there was a problem with the cluster heartbeat messages. Check your syslog (/var/log/messages): it probably contains a lot more messages describing what was happening at that time.

 

The loss of quorum causes the cluster to start determining which cluster nodes are still reachable. Since this is a two-node cluster, if the network heartbeat has failed, the node status will be a tie: node 1 thinks node 2 has failed, and vice versa. The extra quorum vote(s) from qdiskd are needed to break the tie.

 

Your quorumd configuration does not appear to have any heuristics defined: this is an error. "man qdisk" says:

2.2. Scoring & Heuristics
The administrator can configure up to 10 purely arbitrary heuristics,
and must exercise caution in doing so. At least one administrator-defined heuristic is required for operation, [...]

 

With no heuristics configured, qdiskd may not be able to decide which node is OK and which one is not. If qdiskd does not cast its extra vote, the cluster cannot start failover nor even fencing: it can only wait forever for the cluster heartbeat to come back. Or if the other node gets the needed vote first, it will fence this node and then failover the services.

 

If you have Cisco network hardware, make sure you are not having this network configuration issue:

http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a008059a9df.shtml

 

See also this RHKB article for ways to verify the multicast capability of your network (RedHat Network access required):

https://access.redhat.com/knowledge/articles/22304

 

MK
rhel6_cluster
Advisor

Re: Quorum dissolved!!

Thanks for your suggestion!

can you tell me about the Heuristics ? and what value do i need to set for it?

 

 

Thanks!

Matti_Kurkela
Honored Contributor

Re: Quorum dissolved!!

Apparently your log was from node 2, i.e. PDC-PIC-PL-02.


This node lost contact with node 1 at 15:26:44.

The quorum disk did not give it an extra vote, so this node could only wait: if node 1 was functional at that time, it should have tried to fence this node. So this node performed as expected, and you should be investigating why node 1 did not succeed in fencing this node. Start with checking the logs of the other node from this same time period.

 

At 15:30:33, the contact with node 1 returned and this node rejoined to the cluster.

 

Your other thread indicates you've noticed that fencing does not work. That is because you have not added the options required with iLO3, as indicated on the RHEL 6.3 fence_ipmilan man page: please see your other thread.

http://h30499.www3.hp.com/t5/System-Administration/Fenceing-in-RHEL6/m-p/5794233

 

MK