- Community Home
- >
- Servers and Operating Systems
- >
- HPE ProLiant
- >
- ProLiant Servers (ML,DL,SL)
- >
- hp-health and Centos 6.2 Cluster
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-16-2013 12:02 AM
04-16-2013 12:02 AM
hp-health and Centos 6.2 Cluster
Hi
We have setup a Centos Cluster based on 6.4, based on luci, and two ricci nodes.
We have setup ricci on the nodes, joined them to the cluster via luci, and rebooted.
Service Groups have been setup, and failover works perfectly.
When we install hp-health rpm on both nodes, it appears that communication between the nodes is affected, and the 2nd node in the cluster, can not get the updated configuration. This breaks the cluster.
Apr 15 17:56:35 ted corosync[2556]: [CMAN ] Unable to load new config in corosync: New configuration version has to be newer than current running configuration
Apr 15 17:56:35 ted corosync[2556]: [CMAN ] Can't get updated config version 42: New configuration version has to be newer than current running configuration#012.
Apr 15 17:56:35 ted corosync[2556]: [CMAN ] Activity suspended on this node
Apr 15 17:56:35 ted corosync[2556]: [CMAN ] Error reloading the configuration, will retry every second
Apr 15 17:56:35 ted corosync[2556]: [CMAN ] Node 1 conflict, remote config version id=42, local=41
Are there any ideas on this matter?
Kind of ironic that hp-health is making my cluster sick....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-07-2013 12:11 AM
05-07-2013 12:11 AM
Re: hp-health and Centos 6.2 Cluster
Given that none of the utilities within the hp-health RPM is even capable of network communication, your theory seems very strange indeed. I would think it more likely that there was some completely separate problem in your cluster which became apparent after you rebooted.
Please run
cman_tool status |grep "Config Version:"
on both cluster nodes, to see what each node assumes the current cluster configuration version number to be. Do the versions match the version listed in the beginning of the /etc/cluster/cluster.conf file? Are the version numbers the same on all the nodes?
I've once seen a case where ricci initially failed to perform some task, so it left the task file in its queue directory, /var/lib/ricci/queue. When the system was later rebooted, ricci tried to run the task again, causing the cluster configuration to go out of sync.
If the cluster configuration version number is out of sync, you should first stop the ricci agents on both nodes, remove any existing ricci job files from /var/lib/ricci/queue, then restart the ricci agents.
After that, if the nodes have different versions of /etc/cluster/cluster.conf, compare them to find the difference in them. Pick the one that seems to be the most correct, and increase its version number to a value higher than
cman_tool status | grep "Config Version:"
reports on any node.
E.g. if one node reports version 41 and the other 42, you should edit the correct cluster.conf file to have version 43.
Then run
cman_tool version -r
on the node that has the updated cluster.conf with the highest version number (though I think it will work on any cluster node). This command will propagate the updated cluster.conf to all nodes through ricci automatically. This should clear the configuration version conflict in your cluster.
Anyway, after doing any major operations through luci, and especially if some luci operation has failed, you should check the /var/lib/ricci/queue directories on your cluster nodes. If they contain any files, it means ricci thinks some operation has not been performed to completion yet.
If you reboot at this point, ricci will retry the operation after the reboot (possibly after a small delay), which may cause nasty surprises and/or confusion. If ricci seems to be unable to complete some task on some node, you should stop ricci on that node, clear the queue directory, and restart ricci before doing anything else.