- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Re: Redhat cluster is not working properly
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2011 09:17 AM - edited 09-14-2011 02:33 AM
09-13-2011 09:17 AM - edited 09-14-2011 02:33 AM
Redhat cluster is not working properly
I configured the cluster on two nodes (with out Quorum disk) for the services httpd and ftpd. That is, httpd will run from node1 all the time and ftp will run from node2 all the time, if node1 fails, then node2 will host both http and ftp, and If node 2 fails, then node1 will host both http and ftp...
Hardware details of nodes:
Both nodes are “ProLiant BL460c G7”
I have configured direct ILO login on both nodes.
I used following method to configure cluster:
1)First installed OS, assigned IP address and host name for nodes:
Node names:
Node1: emdlagpbw01 (10.250.1.97)
Node2: emdlagpbw02 (10.250.1.98)
Each node is able to ping each other server by hostname and IP address.
2)Configured the cluster through "system-config-cluster" on node1.
My cluster configuration:
# more /etc/cluster/cluster.conf
<?xml version="1.0" ?>
<cluster alias="clu" config_version="14" name="clu">
<fence_daemon post_fail_delay="0" post_join_delay="20"/>
<clusternodes>
<clusternode name="emdlagpbw01.emdna.emdiesels.com" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="EMDLAGPBW01R"/>
</method>
</fence>
</clusternode>
<clusternode name="emdlagpbw02.emdna.emdiesels.com" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="EMDLAGPBW02R"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ilo" hostname="emdlagpbw01R" login="xxx" name="EMDLAGPBW01R" passwd="xxxxx"/>
<fencedevice agent="fence_ilo" hostname="emdlagpbw02R" login="xxx" name="EMDLAGPBW02R" passwd="xxxxx"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="EMDLAGPBWCL1" ordered="1" restricted="1">
<failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="1"/>
<failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="2"/>
</failoverdomain>
<failoverdomain name="EMDLAGPBWCL2" ordered="1" restricted="1">
<failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="2"/>
<failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.250.1.107/22" monitor_link="1"/>
<script file="/etc/init.d/httpd" name="httpd"/>
<ip address="10.250.1.108/22" monitor_link="1"/>
<script file="/etc/init.d/vsftpd" name="vsftpd"/>
</resources>
<service autostart="1" domain="EMDLAGPBWCL1" name="httpd">
<ip ref="10.250.1.107/22"/>
<script ref="httpd"/>
</service>
<service autostart="1" domain="EMDLAGPBWCL2" name="vsftpd">
<ip ref="10.250.1.108/22"/>
<script ref="vsftpd"/>
</service>
</rm>
</cluster>
3)copied the file /etc/cluster/cluster.conf to node2
4)Started "cman" and "rgmanager" services on node1 and node2...
But it is taking lot of time to start fencing on both nodes...
I am seeing following message from the log file "/var/log/messages" on both nodes.
Sep 13 10:44:10 emdlagpbw02 fenced[32371]: agent "fence_ilo" reports: Unable to connect/login to fencing device
Sep 13 10:44:10 emdlagpbw02 fenced[32371]: fence "emdlagpbw01.emdna.emdiesels.com" failed
My Questions:
1)Are my cluster configuration correct?
2)Where is the issue?
3)Why fencing is not starting, how to resolve this?
4)Why the services http and ftp is not started respectively on node1 and node2?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-14-2011 06:30 AM
09-14-2011 06:30 AM
Re: Redhat cluster is not working properly
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-14-2011 07:38 AM
09-14-2011 07:38 AM
Re: Redhat cluster is not working properly
Knowing what model servers your working with would help. If your servers have iLO3 you need to use fence_ipmilan as the fencing agent and the fence user you create in iLO will need to have administrator privileges
A patch for fence_ilo is being worked on to enable iLO3 support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-14-2011 10:03 AM
09-14-2011 10:03 AM
Re: Redhat cluster is not working properly
Hi Senthil,
Try below Redhat KB article. Might be helpful
https://access.redhat.com/kb/docs/DOC-57676
Regards,
Chhaya
Chhaya
I am an HP employee.
Was this post useful? - You may click the KUDOS! star to say thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2011 09:46 AM
09-19-2011 09:46 AM
Re: Redhat cluster is not working properly
I changed fence device configuration as per following link "https://access.redhat.com/kb/docs/DOC-56880".
Right now my cluster.conf file in both nodes is...
# vi /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="clu" config_version="14" name="clu">
<fence_daemon post_fail_delay="0" post_join_delay="20"/>
<clusternodes>
<clusternode name="emdlagpbw01.emdna.emdiesels.com" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="EMDLAGPBW01R"/>
</method>
</fence>
</clusternode>
<clusternode name="emdlagpbw02.emdna.emdiesels.com" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="EMDLAGPBW02R"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" power_wait="10" lanplus="1" ipaddr="10.254.1.113" login="xxx" name="EMDLAGPBW01R" passwd="xxxxxx"/>
<fencedevice agent="fence_ipmilan" power_wait="10" lanplus="1" ipaddr="10.254.1.143" login="xxx" name="EMDLAGPBW02R" passwd="xxxxxx"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="EMDLAGPBWCL1" ordered="1" restricted="1">
<failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="1"/>
<failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="2"/>
</failoverdomain>
<failoverdomain name="EMDLAGPBWCL2" ordered="1" restricted="1">
<failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="2"/>
<failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.250.1.107/22" monitor_link="1"/>
<script file="/etc/init.d/httpd" name="httpd"/>
<ip address="10.250.1.108/22" monitor_link="1"/>
<script file="/etc/init.d/vsftpd" name="vsftpd"/>
</resources>
<service autostart="1" domain="EMDLAGPBWCL1" name="httpd" recovery="relocate">
<ip ref="10.250.1.107/22"/>
<script ref="httpd"/>
</service>
<service autostart="1" domain="EMDLAGPBWCL2" name="vsftpd" recovery="relocate">
<ip ref="10.250.1.107/22"/>
<script ref="httpd"/>
</service>
<service autostart="1" domain="EMDLAGPBWCL2" name="vsftpd" recovery="relocate">
<ip ref="10.250.1.108/22"/>
<script ref="vsftpd"/>
</service>
</rm>
</cluster>
After having same cluster.conf file in both nodes, I started the service "cman" on node1 "emdlagpbw01.emdna.emdiesels.com"...
Example:
# service cman start
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... done
Starting daemons... done
Starting fencing... done
[ OK ]
Output of log file "/var/log/messages" in node1:
Sep 19 11:23:13 emdlagpbw01 openais[11315]: [CLM ] got nodejoin message 10.250.1.97
Sep 19 11:23:14 emdlagpbw01 ccsd[11306]: Initial status:: Quorate
Sep 19 11:24:21 emdlagpbw01 fenced[11335]: emdlagpbw02.emdna.emdiesels.com not a cluster member after 20 sec post_join_delay
Sep 19 11:24:21 emdlagpbw01 fenced[11335]: fencing node "emdlagpbw02.emdna.emdiesels.com"
Sep 19 11:24:37 emdlagpbw01 fenced[11335]: fence "emdlagpbw02.emdna.emdiesels.com" success
Now the node 2 "emdlagpbw02.emdna.emdiesels.com" has been fenced (rebooted automatically)...
Once node 2 came up, I started the service "cman" on node2, Now node 1 has been fenced...
Output of log file "/var/log/message" in node2:
Sep 19 11:31:14 emdlagpbw02 ccsd[7559]: Initial status:: Quorate
Sep 19 11:32:21 emdlagpbw02 fenced[7587]: emdlagpbw01.emdna.emdiesels.com not a cluster member after 20 sec post_join_delay
Sep 19 11:32:21 emdlagpbw02 fenced[7587]: fencing node "emdlagpbw01.emdna.emdiesels.com"
Sep 19 11:32:37 emdlagpbw02 fenced[7587]: fence "emdlagpbw01.emdna.emdiesels.com" success
How to solve this issue, Please help me...
Thanks a lot in advance..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2011 10:57 AM
09-19-2011 10:57 AM
Re: Redhat cluster is not working properly
Apparently your cluster nodes aren't receiving the multicast sent by each other.
The steps listed in
https://access.redhat.com/kb/docs/DOC-57237
could be useful in confirming the problem. (Note: when you ping the cluster multicast address with a two-node cluster, you should get two responses for each outgoing ping message. If you get just one, multicast is not working in your network.)
This is a known issue with Cisco switches (and possibly other switches with a similar IGMP snooping implementation). Please see this document for the root cause analysis and links to Cisco documents with suggested solutions:
https://access.redhat.com/kb/docs/DOC-57238
Direct link to the most relevant Cisco document:
http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a008059a9df.shtml
In a nutshell: in Cisco switches, the IGMP snooping feature is enabled by default, but it will only work correctly if there is a multicast-enabled router (mrouter) or some other source of IGMP queries in the same network segment/VLAN. If you don't need a full multicast routing functionality (i.e. if you only need multicast to work within a particular network segment/VLAN), you can use a IGMP Querier function that is built into some Cisco switches. The IGMP Querier will send IGMP queries just like a mrouter, but won't actually route any multicast packets at all. But the queries, and the nodes' responses to them, will allow the IGMP snooping feature of the switches to work as designed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-20-2011 07:54 AM
09-20-2011 07:54 AM
Re: Redhat cluster is not working properly
I think there is some problem in your cluster.conf file. I can't understand why you have mentioned "failoverdomains" two times.
<failoverdomains> <failoverdomain name="EMDLAGPBWCL1" ordered="1" restricted="1"> <failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="1"/> <failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="2"/> </failoverdomain> <failoverdomain name="EMDLAGPBWCL2" ordered="1" restricted="1"> <failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="2"/> <failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="1"/> </failoverdomain> </failoverdomains>
Can you please replace cluster.conf file with the following entry.
<failoverdomains> <failoverdomain name="EMDLAGPBWCL1" ordered="1" restricted="1"> <failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="1"/> <failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="2"/> </failoverdomain> </failoverdomains>
Then stop and start cman and rgmanager service.
[root@emdlagpbw01 ~]# service rgmanager stop [root@emdlagpbw01 ~]# service cman stop [root@emdlagpbw01 ~]# service cman start [root@emdlagpbw01 ~]# service rgmanager start [root@emdlagpbw01 ~]# clustat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-20-2011 12:40 PM
09-20-2011 12:40 PM
Re: Redhat cluster is not working properly
> I think there is some problem in your cluster.conf file. I can't understand why you have mentioned "failoverdomains" two times.
This is not a problem.
Senthil's configuration file looks like he wants to run 3 cluster services, so that the httpd service normally runs on node 1 and the two vsftpd instances on node 2. To make this happen, Senthil needs two failover domain definitions: the first failoverdomain EMDLAGPBWCL1 has node 1 as the top priority, while the second domain has the node priorities in the reverse order. Then he assigns the httpd server to failover domain EMDLAGPBWCL1, and the vsftpd's to EMDLAGPBWCL2.
Each service has autostart enabled: together with the failover domain definitions, this makes the cluster automatically start each service on the node Senthil prefers them, unless there is a problem that requires the service to failover elsewhere.
If Senthil had only one failover domain as you suggest, all the services would normally run on the first node.
This is actually a great philosophical question failover clustering:
- do you want to keep the spare node(s) idle, so that you know for sure the response times of your services won't degrade if you have to failover?
- or do you want to provide the best possible response time you can (by distributing the services over the whole set of nodes) in the normal situation, but are willing to accept some degradation when a failover happens?
With a two-node cluster, the first option is expensive: you have the computing power of one server, but the hardware costs of two. The second option allows you to get more use of your hardware, but you must carefully track your workloads and remember that if one node fails, all the workload will be moved to the remaining node, which may become overloaded. But if you understand and accept that risk, and have a plan to mitigate the risk, that's OK. (Perhaps one of the services is less critical than the others and can be shut down in case of overload? Or perhaps Senthil's boss has some estimates of the expected usage of the cluster services, and figures he can authorize the purchase of more hardware well before the workload will become so heavy it cannot all be run on a single node any more.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-27-2011 09:25 AM
09-27-2011 09:25 AM
Re: Redhat cluster is not working properly
I've seen this ping pong fencing happening if the box'es time is slightly off. I recommend trying:
I'd bring each box up to single user mode (if you have cluster starting when box comes up). Then:
ifup mynic
ntpdate mytime.server.foo # using your time server
hwclock --systohc
date; hwclock --show # verify they are mated
reboot
That works for me e.g. if I move a blade and time gets fracked.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-04-2011 08:37 AM
10-04-2011 08:37 AM
Re: Redhat cluster is not working properly
Thanks a lot for your support to resolve this issue..
Right now my cluster is working fine with following configuration.
# vi /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="clu" config_version="14" name="clu">
<fence_daemon post_fail_delay="0" post_join_delay="20"/>
<clusternodes>
<clusternode name="emdlagpbw01.emdna.emdiesels.com" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="EMDLAGPBW01R"/>
</method>
</fence>
</clusternode>
<clusternode name="emdlagpbw02.emdna.emdiesels.com" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="EMDLAGPBW02R"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1" broadcast="yes"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" power_wait="10" lanplus="1" ipaddr="10.254.1.113" login="tcs" name="EMDLAGPBW01R" passwd="tCs12345"/>
<fencedevice agent="fence_ipmilan" power_wait="10" lanplus="1" ipaddr="10.254.1.143" login="tcs" name="EMDLAGPBW02R" passwd="tCs12345"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="EMDLAGPBWCL1" ordered="1" restricted="1">
<failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="1"/>
<failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="2"/>
</failoverdomain>
<failoverdomain name="EMDLAGPBWCL2" ordered="1" restricted="1">
<failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="2"/>
<failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.250.1.107/22" monitor_link="1"/>
<script file="/etc/init.d/httpd" name="httpd"/>
<ip address="10.250.1.108/22" monitor_link="1"/>
<script file="/etc/init.d/vsftpd" name="vsftpd"/>
</resources>
<service autostart="1" domain="EMDLAGPBWCL1" name="httpd" recovery="relocate">
<ip ref="10.250.1.107/22"/>
<script ref="httpd"/>
</service>
<service autostart="1" domain="EMDLAGPBWCL2" name="vsftpd" recovery="relocate">
<ip ref="10.250.1.108/22"/>
<script ref="vsftpd"/>
</service>
</rm>
</cluster>
So it is working fine with out file system configured...
Now I configured file system like below.
# vi /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="clu" config_version="14" name="clu">
<fence_daemon post_fail_delay="0" post_join_delay="20"/>
<clusternodes>
<clusternode name="emdlagpbw01.emdna.emdiesels.com" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="EMDLAGPBW01R"/>
</method>
</fence>
</clusternode>
<clusternode name="emdlagpbw02.emdna.emdiesels.com" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="EMDLAGPBW02R"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1" broadcast="yes"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" power_wait="10" lanplus="1" ipaddr="10.254.1.113" login="tcs" name="EMDLAGPBW01R" passwd="tCs12345"/>
<fencedevice agent="fence_ipmilan" power_wait="10" lanplus="1" ipaddr="10.254.1.143" login="tcs" name="EMDLAGPBW02R" passwd="tCs12345"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="EMDLAGPBWCL1" ordered="1" restricted="1">
<failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="1"/>
<failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="2"/>
</failoverdomain>
<failoverdomain name="EMDLAGPBWCL2" ordered="1" restricted="1">
<failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="2"/>
<failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.250.1.107/22" monitor_link="1"/>
<script file="/etc/init.d/httpd" name="httpd"/>
<fs device="/dev/sda2" force_fsck="0" force_unmount="1" fsid="33611" fstype="ext3" mountpoint="/test_node1_filesystem" name="test_node
1" options="" self_fence="0"/>
<ip address="10.250.1.108/22" monitor_link="1"/>
<script file="/etc/init.d/vsftpd" name="vsftpd"/>
<fs device="/dev/sda3" force_fsck="0" force_unmount="1" fsid="54001" fstype="ext3" mountpoint="/test_node2_filesystem" name="test_node
2" options="" self_fence="0"/>
</resources>
<service autostart="1" domain="EMDLAGPBWCL1" name="httpd" recovery="relocate">
<ip ref="10.250.1.107/22"/>
<script ref="httpd"/>
<fs ref="test_node1"/>
</service>
<service autostart="1" domain="EMDLAGPBWCL2" name="vsftpd" recovery="relocate">
<ip ref="10.250.1.108/22"/>
<script ref="vsftpd"/>
<fs ref="test_node2"/>
</service>
</rm>
</cluster>
For your information:
Cluster services "httpd" and "vsftpd" started in both servers. But only script and IP resource are started but not file system...
File system is not mounted in both nodes...
I have assigned same LUN (/dev/sda) for both nodes, so both nodes are able to see the partition like "/dev/sda2" and "/dev/sda3"...
And I have created empty directories "/test_node1_filesystem" and "/test_node2_filesystem" on both nodes..
But I am not able to see mount point "/test_node1_filesystem" in node1 and "/test_node2_filesystem" in node2 but rest of the resources like scrip "httpd" and IP "10.250.1.107" are started in node1 and script "vsftpd" and IP "10.250.1.108" are started in node1.
How to trouble shoot the issue..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2011 03:25 AM
10-07-2011 03:25 AM
Re: Redhat cluster is not working properly
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2011 03:48 AM
10-07-2011 03:48 AM
Re: Redhat cluster is not working properly
When changing the configuration, you are supposed to increase the config_version value on the first line each time. Otherwise your changes may not be recognized by the running cluster.
The configuration you copy&pasted indicates you did not update the config_version. Before your change:
<cluster alias="clu" config_version="14" name="clu">
After your configuration change:
<cluster alias="clu" config_version="14" name="clu">
Increase the config_version and try again.
The proper procedure for modifying the configuration while the cluster is running depends on the RHEL version.
With RHEL 5, you should not modify /etc/cluster/cluster.conf directly: instead, you should make a copy of it, make changes to the copy (remember to increase config_version!), and then use "ccs_tool update <modified copy of cluster.conf>" to make the changes effective. The ccs_tool will automatically verify the configuration file, propagate it to all the cluster nodes, and then all the cluster nodes can update their configuration files in a synchronized fashion.
With RHEL 6, you can apparently edit /etc/cluster/cluster.conf while the cluster is running, but the changes will take effect only after you run "cman_tool version -r". Before you do that, you should run "ccs_config_validate" to verify the configuration syntax is OK.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2011 12:01 PM
10-10-2011 12:01 PM
Re: Redhat cluster is not working properly
Hi All,
Now file system is working fine with following configuration.
# more /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="BIOSDB" config_version="15" name="BIOSDB">
<quorumd interval="1" label="osdb_qdisk" min_score="1" tko="10" votes="1">
<heuristic interval="2" program="10.250.0.1" score="1"/>
</quorumd>
<fence_daemon post_fail_delay="0" post_join_delay="20"/>
<clusternodes>
<clusternode name="emdlagpbw01.emdna.emdiesels.com" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="EMDLAGPBW01R"/>
</method>
</fence>
</clusternode>
<clusternode name="emdlagpbw02.emdna.emdiesels.com" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="EMDLAGPBW02R"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1" broadcast="yes"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" power_wait="10" lanplus="1" ipaddr="10.254.1.113" login="tcs" name="EMDLAGPBW01R" passwd="tCs12345"/>
<fencedevice agent="fence_ipmilan" power_wait="10" lanplus="1" ipaddr="10.254.1.143" login="tcs" name="EMDLAGPBW02R" passwd="tCs12345"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="EMDLAGPBWCL1" ordered="1" restricted="1">
<failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="1"/>
<failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="2"/>
</failoverdomain>
<failoverdomain name="EMDLAGPBWCL2" ordered="1" restricted="1">
<failoverdomainnode name="emdlagpbw01.emdna.emdiesels.com" priority="2"/>
<failoverdomainnode name="emdlagpbw02.emdna.emdiesels.com" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.250.1.107/22" monitor_link="1"/>
<script file="/etc/init.d/httpd" name="httpd"/>
<lvm name="lvm1" vg_name="test_vg" lv_name="test_lvol1"/>
<fs device="/dev/test_vg/test_lvol1" force_fsck="0" force_unmount="1" fsid="33611" fstype="ext3" mountpoint="/test_node1_filesystem" n
ame="test_node1" options="" self_fence="0"/>
<ip address="10.250.1.108/22" monitor_link="1"/>
<script file="/etc/init.d/vsftpd" name="vsftpd"/>
<lvm name="lvm2" vg_name="test_new_vg" lv_name="test_new_vg_lvol1"/>
<fs device="/dev/test_new_vg/test_new_vg_lvol1" force_fsck="0" force_unmount="1" fsid="54001" fstype="ext3" mountpoint="/test_node2_fi
lesystem" name="test_node2" options="" self_fence="0"/>
</resources>
<service autostart="1" domain="EMDLAGPBWCL1" name="httpd" recovery="relocate">
<ip ref="10.250.1.107/22"/>
<script ref="httpd"/>
<lvm ref="lvm1"/>
<fs ref="test_node1"/>
</service>
<service autostart="1" domain="EMDLAGPBWCL2" name="vsftpd" recovery="relocate">
<ip ref="10.250.1.108/22"/>
<script ref="vsftpd"/>
<lvm ref="lvm2"/>
<fs ref="test_node2"/>
</service>
</rm>
</cluster>
Now I would like to configure Quorum Disk (Qdisk) with this configuration...
For that, I have configured disk "/dev/sda1" as Qdisk and it is visible in both nodes...
Now How can I configure Qdisk in above configuration file?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2011 07:06 AM
11-02-2011 07:06 AM
Re: Redhat cluster is not working properly
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2011 08:43 AM
11-02-2011 08:43 AM
Re: Redhat cluster is not working properly
This line in your cluster configuration does not seem to be correct:
<heuristic interval="2" program="10.250.0.1" score="1"/>
The program= field must be a complete command, not just an IP address. You probably meant something like:
<heuristic interval="2" program="ping -c1 -t1 10.250.0.1" score="1"/>
When you fix this, remember again to increase the config_version value.
Otherwise your quorum disk configuration looks OK to me.
The next step is to prepare the quorum disk with "mkqdisk".
On one node, run:
mkqdisk -c /dev/sda1 -l osdb_qdisk
On the other node, run this command to verify the node sees the newly-created quorum disk:
mkqdisk -L
The output should look like this:
# mkqdisk -L mkqdisk v0.6.0 /dev/sda1: Magic: eb7a62c2 Label: osdb_qdisk Created: Thu Mar 18 15:29:49 2010 Host: emdlagpbw01 Kernel Sector Size: 512 Recorded Sector Size: 512
Then you can start the quorum disk daemon on both nodes:
service qdiskd start chkconfig qdiskd on
After starting the quorum disk daemon, wait a few minutes, then run "clustat". The quorum disk should appear in the cluster member listing, typically with a node ID 0 and with Status "Online, Quorum Disk".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-07-2011 07:03 AM
11-07-2011 07:03 AM
Re: Redhat cluster is not working properly
Have configured Qdisk successfully...Thanks a lot...
Now I have some questions.
1)Can we see the data stored in Qdisk?
2)What is the default deadnode_timeout value for RHEL 5.7?
3)How to see deadnode_timeout set right now?
4)How to set deadnode_timeout value in RHEL 5.7?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2012 10:05 PM
12-30-2012 10:05 PM
Re: Redhat cluster is not working properly
hi Jimmy,
I think you are right. I create a user for fence_ipmilan and execute the command:
fence_ipmilan -A password -a XX.XXX.XXX.XX -l RHCS_USER -p 1qaz2wsx -o status -v -P, it turns out failed. after read your reply, i noticed that i didn't give it the administrator privilege but only a user privilege having right to reset server. then i grant my user as administrator, it succeed.
Thank you very much. It is really helpful for me.