- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Cluster problem
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2004 01:50 PM
01-21-2004 01:50 PM
Cluster problem
I have two node cluster. there are 3 package (billing_pkg, ratig_pkg and ob2cm_pkg) running on this two nodes. rating package switching was disable. I got the following eroor message in syslog. but there was no package interruption (halt or start)during that time.
Billing syslog
Jan 21 17:22:46 billing cmcld: Timed out node rating. It may have failed.
Jan 21 17:22:46 billing cmcld: Attempting to adjust cluster membership
Jan 21 17:22:55 billing cmcld: Obtaining Cluster Lock
Jan 21 17:22:56 billing cmcld: Turning off safety time protection since the cluster
Jan 21 17:22:56 billing cmcld: may now consist of a single node. If ServiceGuard
Jan 21 17:22:56 billing cmcld: fails, this node will not automatically halt
Jan 21 17:22:56 billing cmcld: This will not affect the behavior of Package Failfast
Jan 21 17:22:56 billing cmcld: or Service Failfast. If such a package or service fail,
Jan 21 17:22:56 billing cmcld: this node will automatically halt.
Jan 21 17:23:04 billing cmcld: Enabling safety time protection
Jan 21 17:23:04 billing cmcld: Attempting to adjust cluster membership
Jan 21 17:23:04 billing cmcld: Clearing Cluster Lock
Jan 21 17:23:04 billing cmcld: Resumed updating safety time
Jan 21 17:23:05 billing cmcld: 2 nodes have formed a new cluster, sequence #3
Jan 21 17:23:05 billing cmcld: The new active cluster membership is: billing(id=1), rating(id=2)
Rating syslog
Jan 21 17:23:02 rating cmcld: Warning: cmcld process was unable to run for the last 23 seconds,
Jan 21 17:23:02 rating cmcld: which is longer than the node timeout (8 seconds)
Jan 21 17:23:02 rating cmcld: Communication to node billing has been interrupted
Jan 21 17:23:02 rating cmcld: Node billing may have died
Jan 21 17:23:02 rating cmcld: Attempting to form a new cluster
Jan 21 17:23:04 rating cmcld: Attempting to adjust cluster membership
Jan 21 17:23:05 rating cmcld: Resumed updating safety time
Jan 21 17:23:02 rating cmcld: Communication to node billing has been interrupted
Jan 21 17:23:05 rating cmcld: 2 nodes have formed a new cluster, sequence #3
Jan 21 17:23:02 rating cmcld: Attempting to form a new cluster
Jan 21 17:23:05 rating cmcld: The new active cluster membership is: billing(id=1), rating(id=2)
What may be the reason.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2004 02:02 PM
01-21-2004 02:02 PM
Re: Cluster problem
What is your HEARTBEAT_INTERVAL?
Do you have the HEARTBEAT set across all
available networks?
Rgds...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2004 02:06 PM
01-21-2004 02:06 PM
Re: Cluster problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2004 04:41 PM
01-21-2004 04:41 PM
Re: Cluster problem
I would first look at the rating server. It said cmcld process was unable to run for 23 seconds means the communication to billign server from rating server got interupted for more than the node_timeout value.
When this happens, the cluster will try to reform and a notice will be sent to all the nodes. If any node fails to respond to that notice will TOC itself if it doesn't have the cluster lock.
The time stamps of cmcld logs in your syslog.log indicates the above.
I would pull out some stats from rating server during 17:21 - 17:24 and see if there was any abnormal activity like high system load etc., Even buffer flushes may cause the system to temporarily hang if your buffer cache is too large.
-Sri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2004 04:51 PM
01-21-2004 04:51 PM
Re: Cluster problem
To answer your second question, during the reformation, both the nodes responded back hence the cluster got reformed without package interruptions just in time. This is common during temporary hangs. However, if this symptom is not treated, then it may cause extended timeouts later and may cause the nodes to fail (depending on your configuration).
-Sri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2004 07:03 PM
01-21-2004 07:03 PM
Re: Cluster problem
Luckily for you, the heartbeat communications were restored just before the second node would have TOC'ed and the cluster then reformed as a 2 node cluster.
I would suggest you look at the cluster settings on the cluster, but more importantly investigate why the node was unable to run cmcld, maybe patches need to be updated.