- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- mcs fails on night time
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-21-2004 05:44 PM
07-21-2004 05:44 PM
on my servers for the last two days mcs fails in night.the syslog entry is below.pls tell me what to do ,
first server
---------------
Jul 15 03:25:28 ram cmcld: timers delayed 3.24 seconds
Jul 15 03:25:28 ram cmcld: Warning: cmcld process was unable to run for the last 3 seconds
Jul 20 00:27:51 ram cmcld: timers delayed 2.74 seconds
Jul 21 02:03:23 ram cmcld: Timed out node laxman. It may have failed.
Jul 21 02:03:23 ram cmcld: Attempting to adjust cluster membership
Jul 21 02:03:32 ram cmcld: Obtaining First Dual Cluster Lock
Jul 21 02:03:33 ram cmcld: Obtaining Second Dual Cluster Lock
Jul 21 02:03:33 ram cmcld: Communication attempt to node laxman did not succeed
Jul 21 02:03:34 ram cmcld: Turning off safety time protection since the cluster
Jul 21 02:03:34 ram cmcld: may now consist of a single node. If ServiceGuard
Jul 21 02:03:34 ram cmcld: fails, this node will not automatically halt
Jul 21 02:03:49 ram cmcld: Enabling safety time protection
Jul 21 02:03:49 ram cmcld: Attempting to adjust cluster membership
Jul 21 02:03:49 ram cmcld: Clearing First Dual Cluster Lock
Jul 21 02:03:50 ram cmcld: Clearing Second Dual Cluster Lock
Jul 21 02:03:51 ram cmcld: Resumed updating safety time
Jul 21 02:03:51 ram cmcld: 2 nodes have formed a new cluster, sequence #3
Jul 21 02:03:51 ram cmcld: The new active cluster membership is: ram(id=1), laxman(id=2)
Jul 22 03:29:52 ram cmcld: Warning: cmcld process was unable to run for the last 16 seconds,
Jul 22 03:29:52 ram cmcld: which is longer than the node timeout (8 seconds)
Jul 22 03:29:52 ram cmcld: Timed out node laxman. It may have failed.
Jul 22 03:29:52 ram cmcld: Attempting to adjust cluster membership
Jul 22 03:30:01 ram cmcld: Obtaining First Dual Cluster Lock
Jul 22 03:30:02 ram cmcld: First Cluster lock was denied. Lock was obtained by another node.
Jul 22 03:30:02 ram cmcld: Attempting to form a new cluster
Jul 22 03:30:03 ram cmcld: Resumed updating safety time
Jul 22 03:30:04 ram cmcld: 2 nodes have formed a new cluster, sequence #5
Jul 22 03:30:04 ram cmcld: The new active cluster membership is: laxman(id=2), ram(id=1)
Jul 22 03:30:02 ram cmcld: Attempting to adjust cluster membership
second server
---------------
Jul 21 02:04:58 laxman cmcld: Communication to node ram has been interrupted
Jul 21 02:04:58 laxman cmcld: Node ram may have died
Jul 21 02:04:58 laxman cmcld: Attempting to form a new cluster
Jul 21 02:05:07 laxman cmcld: Obtaining First Dual Cluster Lock
Jul 21 02:05:08 laxman cmcld: First Cluster lock was denied. Lock was obtained by another node.
Jul 21 02:05:24 laxman cmcld: Heartbeat connection attempt to node ram timed out
Jul 21 02:05:24 laxman cmcld: Attempting to adjust cluster membership
Jul 21 02:05:25 laxman cmcld: Resumed updating safety time
Jul 21 02:05:26 laxman cmcld: 2 nodes have formed a new cluster, sequence #3
Jul 21 02:05:26 laxman cmcld: The new active cluster membership is: ram(id=1), laxman(id=2)
Jul 21 02:05:24 laxman cmcld: Attempting to form a new cluster
Jul 22 03:31:21 laxman cmcld: Timed out node ram. It may have failed.
Jul 22 03:31:21 laxman cmcld: Attempting to form a new cluster
Jul 22 03:31:30 laxman cmcld: Obtaining First Dual Cluster Lock
Jul 22 03:31:31 laxman cmcld: Obtaining Second Dual Cluster Lock
Jul 22 03:31:32 laxman cmcld: Turning off safety time protection since the cluster
Jul 22 03:31:32 laxman cmcld: may now consist of a single node. If ServiceGuard
Jul 22 03:31:32 laxman cmcld: fails, this node will not automatically halt
Jul 22 03:31:39 laxman cmcld: Enabling safety time protection
Jul 22 03:31:39 laxman cmcld: Attempting to adjust cluster membership
Jul 22 03:31:39 laxman cmcld: Clearing First Dual Cluster Lock
Jul 22 03:31:40 laxman cmcld: Clearing Second Dual Cluster Lock
Jul 22 03:31:41 laxman cmcld: Resumed updating safety time
Jul 22 03:31:41 laxman cmcld: 2 nodes have formed a new cluster, sequence #5
Jul 22 03:31:41 laxman cmcld: The new active cluster membership is: laxman(id=2), ram(id=1)
pls tell me how i will be able to trace the problem
regds,
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-21-2004 07:46 PM
07-21-2004 07:46 PM
Re: mcs fails on night time
(ram and laxman--Where is sita then??)
are not able to talk to each other and ram trying to get cluster lock disk and failing.
Finally laxman gets the lock disk and cluster is formed.
Check if you have any problems with heartbeat.
Anil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-22-2004 12:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-22-2004 05:51 PM
07-22-2004 05:51 PM
Re: mcs fails on night time
regds
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-22-2004 08:11 PM
07-22-2004 08:11 PM
Re: mcs fails on night time
I once saw a problem where the network provider kept taking lines down in the middle of the night and not telling anyone about it.
If you have the heartbeat going across only one lan interface, then add it to a
second interface in your cluster ascii file, then re-check and apply the cluster.
If you have only one lan interface, then that is a SPOF and it is pointless running serviceguard anyway, unless you use a serial crossover heartbeat.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-23-2004 12:17 AM
07-23-2004 12:17 AM
Re: mcs fails on night time
14. The cmcld daemon may log the message "timers delayed
x.x seconds" due to kernel latency issues, or a network
partition may separate nodes in the cluster. A
ServiceGuard cluster of more than 2 nodes with a
cluster lock, after experiencing such a hang or
partition, may result in the formation of 2 clusters.
This is a corner case where the hang or partition
happens while a node is joining a previously formed 2-
node cluster. The joining node forms a cluster with
the original coordinator node, while the
non-coordinator node forms a cluster by itself.
The current version of the patch is PHSS_31015.
The problem might also be due to a default NODE_TIMEOUT value.
Use this command to determine what the NODE_TIMEOUT value is set to:
# cmviewconf | grep "node timeout"
If it is 2 seconds, adjust it to 6-8 seconds!
You can do that by editting the cluster configuration ASCII file normally saved by the admin on one of the servers in /etc/cmcluster. If you can't find it, you can reconstitute it from the cluster binary by using this command:
# cmgetconf cluster.ascii
Once it's recreated - check it's validity against the current cluster environment:
# cmcheckconf -C cluster.ascii
If this succeeds, edit the NODE_TIMEOUT value up to 6-8 seconds, then halt the cluster when you have the opportunity (cmhaltcl -f) and update the cluster binary:
# cmapplyconf -f -C cluster.ascii
-StephenD.