HPE GreenLake Administration
- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- MC/SG messages..
Operating System - HP-UX
1836987
Members
2274
Online
110111
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Go to solution
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2001 05:27 PM
06-19-2001 05:27 PM
Hi all ~~
I had a messages from my system..
=========SYSTEM1=========
cmcld: Communication to node SYSTEM2 has been interrupted
cmcld: node SYSTEM2 may have died
cmcld: Attempting to form a new cluster
cmcld: timers delayed 295.18 seconds
cmcld: Warning: cmcld process was unable to run for the last 295 seconds
cmcld: Resumed updating safety time
cmcld: 2 nodes have formed a new cluster, sequence #2
cmcld: timers delayed 295.18 seconds
cmcld: The new active cluster membership is: SYSTEM2(id=1), SYSTEM1(id=2)
=========SYSTEM2=========
cmcld: Timed out node SYSTEM1. It may have failed.
cmcld: Attempting to adjust cluster membership
cmcld: Clearing Cluster Lock
cmcld: Resumed updating safety time
cmcld: 2 nodes have formed a new cluster, sequence #2
cmcld: The new active cluster membership is: SYSTEM2(id=1), SYSTEM1(id=2)
What happen the systems~~?
thank you..
I had a messages from my system..
=========SYSTEM1=========
cmcld: Communication to node SYSTEM2 has been interrupted
cmcld: node SYSTEM2 may have died
cmcld: Attempting to form a new cluster
cmcld: timers delayed 295.18 seconds
cmcld: Warning: cmcld process was unable to run for the last 295 seconds
cmcld: Resumed updating safety time
cmcld: 2 nodes have formed a new cluster, sequence #2
cmcld: timers delayed 295.18 seconds
cmcld: The new active cluster membership is: SYSTEM2(id=1), SYSTEM1(id=2)
=========SYSTEM2=========
cmcld: Timed out node SYSTEM1. It may have failed.
cmcld: Attempting to adjust cluster membership
cmcld: Clearing Cluster Lock
cmcld: Resumed updating safety time
cmcld: 2 nodes have formed a new cluster, sequence #2
cmcld: The new active cluster membership is: SYSTEM2(id=1), SYSTEM1(id=2)
What happen the systems~~?
thank you..
Solved! Go to Solution.
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2001 06:17 PM
06-19-2001 06:17 PM
Solution
Hello,
My guess is that the cmcld daemon on SYSTEM1 got blocked and couldn't run for 295 seconds, which is a very long time! Possibly SYSTEM1 was extremely busy for a few minutes? The other system didn't attempt to take over the cluster, so it must have been seeing the heartbeat packets from the first system.
HP does recommend setting the node timeout for the cluster up to 6 to 8 seconds; the default is (or used to be) 2 seconds. I'm not sure that would have made a difference in this case. I used to have lots of problems with a three node cluster reforming many times each day, but raising the node timeout value solved that problem. I don't recall seeing the error about the cmcld not responding.
I'd check SYSTEM1 and try to figure out what it was doing during the time that the cmcld daemon was not responding.
JP
My guess is that the cmcld daemon on SYSTEM1 got blocked and couldn't run for 295 seconds, which is a very long time! Possibly SYSTEM1 was extremely busy for a few minutes? The other system didn't attempt to take over the cluster, so it must have been seeing the heartbeat packets from the first system.
HP does recommend setting the node timeout for the cluster up to 6 to 8 seconds; the default is (or used to be) 2 seconds. I'm not sure that would have made a difference in this case. I used to have lots of problems with a three node cluster reforming many times each day, but raising the node timeout value solved that problem. I don't recall seeing the error about the cmcld not responding.
I'd check SYSTEM1 and try to figure out what it was doing during the time that the cmcld daemon was not responding.
JP
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2001 10:28 PM
06-19-2001 10:28 PM
Re: MC/SG messages..
you do not say which version of ServiceGuard you are using, but I would recommend you have the latest patch installed for the version on your systems.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2001 10:44 PM
06-19-2001 10:44 PM
Re: MC/SG messages..
You definitely dealt with a system hang on system1. You need to contact your local HP Response Center to check your patch level. It is likely that some important kernel patches are outdated or missing on your machine. It is also possible to TOC the machine during a hang period and to analyze the resulting dump afterwards. In the majority of cases we are able to deduce the root cause from a hung system's crash dump.
From the timing I see in the syslog, it also appears that the NODE_TIMEOUT parameter is much higher than the recommended value (5-8 seconds). When the patches are applied you should also change these.
Carsten
From the timing I see in the syslog, it also appears that the NODE_TIMEOUT parameter is much higher than the recommended value (5-8 seconds). When the patches are applied you should also change these.
Carsten
-------------------------------------------------------------------------------------------------
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Events and news
Customer resources
© Copyright 2025 Hewlett Packard Enterprise Development LP