<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Cluster problem in Operating System - HP-UX</title>
    <link>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170475#M161192</link>
    <description>you have had what is often referred to as a mini-hang on the second node, resulting in a loss of heartbeat communications between the nodes.The first node has then attempted to reform as a single node cluster, and obtaining the cluster lock disc in order to do this.&lt;BR /&gt;Luckily for you, the heartbeat communications were restored just before the second node would have TOC'ed and the cluster then reformed as a 2 node cluster.&lt;BR /&gt;I would suggest you look at the cluster settings on the cluster, but more importantly investigate why the node was unable to run cmcld, maybe patches need to be updated.&lt;BR /&gt;</description>
    <pubDate>Thu, 22 Jan 2004 03:03:49 GMT</pubDate>
    <dc:creator>melvyn burnard</dc:creator>
    <dc:date>2004-01-22T03:03:49Z</dc:date>
    <item>
      <title>Cluster problem</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170470#M161187</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I have two node cluster. there are 3 package (billing_pkg, ratig_pkg and ob2cm_pkg)  running on this two nodes. rating package switching was disable. I got the following eroor message in syslog. but there was no package interruption (halt or start)during that time. &lt;BR /&gt;&lt;BR /&gt;Billing syslog&lt;BR /&gt;&lt;BR /&gt;Jan 21 17:22:46 billing cmcld: Timed out node rating. It may have failed.&lt;BR /&gt;Jan 21 17:22:46 billing cmcld: Attempting to adjust cluster membership&lt;BR /&gt;Jan 21 17:22:55 billing cmcld: Obtaining Cluster Lock&lt;BR /&gt;Jan 21 17:22:56 billing cmcld: Turning off safety time protection since the cluster&lt;BR /&gt;Jan 21 17:22:56 billing cmcld: may now consist of a single node.  If ServiceGuard&lt;BR /&gt;Jan 21 17:22:56 billing cmcld: fails, this node will not automatically halt&lt;BR /&gt;Jan 21 17:22:56 billing cmcld: This will not affect the behavior of Package Failfast&lt;BR /&gt;Jan 21 17:22:56 billing cmcld: or Service Failfast. If such a package or service fail,&lt;BR /&gt;Jan 21 17:22:56 billing cmcld: this node will automatically halt.&lt;BR /&gt;Jan 21 17:23:04 billing cmcld: Enabling safety time protection&lt;BR /&gt;Jan 21 17:23:04 billing cmcld: Attempting to adjust cluster membership&lt;BR /&gt;Jan 21 17:23:04 billing cmcld: Clearing Cluster Lock&lt;BR /&gt;Jan 21 17:23:04 billing cmcld: Resumed updating safety time&lt;BR /&gt;Jan 21 17:23:05 billing cmcld: 2 nodes have formed a new cluster, sequence #3&lt;BR /&gt;Jan 21 17:23:05 billing cmcld: The new active cluster membership is: billing(id=1), rating(id=2)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Rating syslog&lt;BR /&gt;&lt;BR /&gt;Jan 21 17:23:02 rating cmcld: Warning: cmcld process was unable to run for the last 23 seconds,&lt;BR /&gt;Jan 21 17:23:02 rating cmcld: which is longer than the node timeout (8 seconds)&lt;BR /&gt;Jan 21 17:23:02 rating cmcld: Communication to node billing has been interrupted&lt;BR /&gt;Jan 21 17:23:02 rating cmcld: Node billing may have died&lt;BR /&gt;Jan 21 17:23:02 rating cmcld: Attempting to form a new cluster&lt;BR /&gt;Jan 21 17:23:04 rating cmcld: Attempting to adjust cluster membership&lt;BR /&gt;Jan 21 17:23:05 rating cmcld: Resumed updating safety time&lt;BR /&gt;Jan 21 17:23:02 rating cmcld: Communication to node billing has been interrupted&lt;BR /&gt;Jan 21 17:23:05 rating cmcld: 2 nodes have formed a new cluster, sequence #3&lt;BR /&gt;Jan 21 17:23:02 rating cmcld: Attempting to form a new cluster&lt;BR /&gt;Jan 21 17:23:05 rating cmcld: The new active cluster membership is: billing(id=1), rating(id=2)&lt;BR /&gt;&lt;BR /&gt;What may be the reason.&lt;BR /&gt;</description>
      <pubDate>Wed, 21 Jan 2004 21:50:19 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170470#M161187</guid>
      <dc:creator>M. Tariq Ayub</dc:creator>
      <dc:date>2004-01-21T21:50:19Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster problem</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170471#M161188</link>
      <description>Looks like maybe your heartbeat lan is timing out.... do you have a dedicated heartbeat lan?&lt;BR /&gt;&lt;BR /&gt;What is your HEARTBEAT_INTERVAL?&lt;BR /&gt;&lt;BR /&gt;Do you have the HEARTBEAT set across all&lt;BR /&gt;available networks?&lt;BR /&gt;&lt;BR /&gt;Rgds...Geoff&lt;BR /&gt;</description>
      <pubDate>Wed, 21 Jan 2004 22:02:49 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170471#M161188</guid>
      <dc:creator>Geoff Wild</dc:creator>
      <dc:date>2004-01-21T22:02:49Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster problem</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170472#M161189</link>
      <description>We have dedicated heart beat. but my question is if thre was a poroblem in HB then new cluster will form on billing node as it bear cluster lock disk. There was no problem with the package.</description>
      <pubDate>Wed, 21 Jan 2004 22:06:41 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170472#M161189</guid>
      <dc:creator>M. Tariq Ayub</dc:creator>
      <dc:date>2004-01-21T22:06:41Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster problem</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170473#M161190</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I would first look at the rating server. It said cmcld process was unable to run for 23 seconds means the communication to billign server from rating server got interupted for more than the node_timeout value.&lt;BR /&gt;&lt;BR /&gt;When this happens, the cluster will try to reform and a notice will be sent to all the nodes. If any node fails to respond to that notice will TOC itself if it doesn't have the cluster lock.&lt;BR /&gt;&lt;BR /&gt;The time stamps of cmcld logs in your syslog.log indicates the above.&lt;BR /&gt;&lt;BR /&gt;I would pull out some stats from rating server during 17:21 - 17:24 and see if there was any abnormal activity like high system load etc., Even buffer flushes may cause the system to temporarily hang if your buffer cache is too large.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;-Sri</description>
      <pubDate>Thu, 22 Jan 2004 00:41:00 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170473#M161190</guid>
      <dc:creator>Sridhar Bhaskarla</dc:creator>
      <dc:date>2004-01-22T00:41:00Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster problem</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170474#M161191</link>
      <description>Hi (Again),&lt;BR /&gt;&lt;BR /&gt;To answer your second question, during the reformation, both the nodes responded back hence the cluster got reformed without package interruptions just in time. This is common during temporary hangs. However, if this symptom is not treated, then it may cause extended timeouts later and may cause the nodes to fail (depending on your configuration).&lt;BR /&gt;&lt;BR /&gt;-Sri</description>
      <pubDate>Thu, 22 Jan 2004 00:51:52 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170474#M161191</guid>
      <dc:creator>Sridhar Bhaskarla</dc:creator>
      <dc:date>2004-01-22T00:51:52Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster problem</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170475#M161192</link>
      <description>you have had what is often referred to as a mini-hang on the second node, resulting in a loss of heartbeat communications between the nodes.The first node has then attempted to reform as a single node cluster, and obtaining the cluster lock disc in order to do this.&lt;BR /&gt;Luckily for you, the heartbeat communications were restored just before the second node would have TOC'ed and the cluster then reformed as a 2 node cluster.&lt;BR /&gt;I would suggest you look at the cluster settings on the cluster, but more importantly investigate why the node was unable to run cmcld, maybe patches need to be updated.&lt;BR /&gt;</description>
      <pubDate>Thu, 22 Jan 2004 03:03:49 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/cluster-problem/m-p/3170475#M161192</guid>
      <dc:creator>melvyn burnard</dc:creator>
      <dc:date>2004-01-22T03:03:49Z</dc:date>
    </item>
  </channel>
</rss>

