<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Cluster suspended while one member had a defect in Operating System - OpenVMS</title>
    <link>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022474#M84483</link>
    <description>Hi,&lt;BR /&gt;&lt;BR /&gt;The system disk is reachable from all machines in the cluster, so this couldn't be the problem.&lt;BR /&gt;It was a bit of mystique for me. In the moment, one cluster member was broken, all the other machines suspended. No entrie in the operator log for the reason. They worked again, when the broken cluster member was back. I couldn't understand this.&lt;BR /&gt;&lt;BR /&gt;Short summary: 4 nodes in a cluster, no quorum disk, expected votes 3, each machine 1 vote.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;&lt;BR /&gt;Kirsten</description>
    <pubDate>Tue, 19 Jun 2007 02:39:26 GMT</pubDate>
    <dc:creator>Kirsten Knüttel</dc:creator>
    <dc:date>2007-06-19T02:39:26Z</dc:date>
    <item>
      <title>Cluster suspended while one member had a defect</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022471#M84480</link>
      <description>Hello,&lt;BR /&gt;&lt;BR /&gt;I had a problem where I can't find the solution. So I hope one of you can help me.&lt;BR /&gt;&lt;BR /&gt;I have a cluster with 4 members (each member got 1 vote, expected votes=3, no quorum disk). Last week, the fan of the CPU of one cluster member had a defect, so this machine turned out. &lt;BR /&gt;In my opinion, the rest of the cluster had to run normal. But it seemed as if the other cluster members suspended. They couldn't be reached, even on the console you couldn't do anything. They worked again (without reboot) when the broken cluster member was back.&lt;BR /&gt;So, what could be the problem of it?&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;&lt;BR /&gt;Kirsten</description>
      <pubDate>Tue, 19 Jun 2007 02:06:16 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022471#M84480</guid>
      <dc:creator>Kirsten Knüttel</dc:creator>
      <dc:date>2007-06-19T02:06:16Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster suspended while one member had a defect</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022472#M84481</link>
      <description>A four member cluster with one vote for each member should have EXPECTED_VOTES set to 4, which yields to a quorum of 3. So in your case the remaining nodes should be able to continue, if one node fails.&lt;BR /&gt;How is storage organizes in your cluster, do the remaining nodes have access to vital disks (e.g. the systemdisk), or are some disks served by the failing node?&lt;BR /&gt;If the quorum was lost, there should be messages on the consoles or in the OPERATOR.LOG.&lt;BR /&gt;&lt;BR /&gt;regards Kalle</description>
      <pubDate>Tue, 19 Jun 2007 02:15:39 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022472#M84481</guid>
      <dc:creator>Karl Rohwedder</dc:creator>
      <dc:date>2007-06-19T02:15:39Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster suspended while one member had a defect</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022473#M84482</link>
      <description>Hello&lt;BR /&gt;&lt;BR /&gt;Do you have a Quorum disk ?&lt;BR /&gt;Can you post the votes of all the members ?&lt;BR /&gt;&lt;BR /&gt;It seems the number of votes was under the quorum, so it may explain why the cluster hang.&lt;BR /&gt;&lt;BR /&gt;It is a pity that you do not have AMDS or Availability Manager, as it tells you the quorum is not reached, and you can force a new value for the quorum, so the Cluster is again working.</description>
      <pubDate>Tue, 19 Jun 2007 02:17:27 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022473#M84482</guid>
      <dc:creator>labadie_1</dc:creator>
      <dc:date>2007-06-19T02:17:27Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster suspended while one member had a defect</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022474#M84483</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;The system disk is reachable from all machines in the cluster, so this couldn't be the problem.&lt;BR /&gt;It was a bit of mystique for me. In the moment, one cluster member was broken, all the other machines suspended. No entrie in the operator log for the reason. They worked again, when the broken cluster member was back. I couldn't understand this.&lt;BR /&gt;&lt;BR /&gt;Short summary: 4 nodes in a cluster, no quorum disk, expected votes 3, each machine 1 vote.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;&lt;BR /&gt;Kirsten</description>
      <pubDate>Tue, 19 Jun 2007 02:39:26 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022474#M84483</guid>
      <dc:creator>Kirsten Knüttel</dc:creator>
      <dc:date>2007-06-19T02:39:26Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster suspended while one member had a defect</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022475#M84484</link>
      <description>Kirsten,&lt;BR /&gt;&lt;BR /&gt;at least some lost-connection... messages should be in the OPERATOR.LOG.&lt;BR /&gt;Can you give some more background on your configuration, e.g. storage, interconnects...&lt;BR /&gt;&lt;BR /&gt;regards Kalle</description>
      <pubDate>Tue, 19 Jun 2007 03:12:31 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022475#M84484</guid>
      <dc:creator>Karl Rohwedder</dc:creator>
      <dc:date>2007-06-19T03:12:31Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster suspended while one member had a defect</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022476#M84485</link>
      <description>For cluster the formula of quoram value is &lt;BR /&gt;=(expected_values/2 +1) rounded down.&lt;BR /&gt;then the calculated value of quorum in your scenario is =3/2+1 =2.5 rounded down i.e. 2&lt;BR /&gt;So when in your cluster, if atleast two nodes alive ,your cluster should be up.I think you should check sysgen parameters (votes,expected_votes,QDSKVOTES) and also modparams.dat file.</description>
      <pubDate>Tue, 19 Jun 2007 04:28:14 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022476#M84485</guid>
      <dc:creator>Mrityunjoy Kundu</dc:creator>
      <dc:date>2007-06-19T04:28:14Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster suspended while one member had a defect</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022477#M84486</link>
      <description>The EXPECTED_VOTES is used during initial boot to determine the QUORUM to allow for cluster funtionality. If this value is not correct, esp. too low you are risking cluster fragmentation. If during normal systemstate, the number of votes exceeds EXPECTED_VOTES, the quorum is raised above. So in this case, when 4 nodes are contributing 4 votes, the QUORUm raises to 3.&lt;BR /&gt;&lt;BR /&gt;regards Kalle</description>
      <pubDate>Tue, 19 Jun 2007 04:37:41 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022477#M84486</guid>
      <dc:creator>Karl Rohwedder</dc:creator>
      <dc:date>2007-06-19T04:37:41Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster suspended while one member had a defect</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022478#M84487</link>
      <description>Kirsten,&lt;BR /&gt;&lt;BR /&gt;your votes configuration is correct. For 4 nodes, the majority (i.e. QUORUM) is 3, so the cluster should continue, if only one node is lost.&lt;BR /&gt;&lt;BR /&gt;It may be too late to find out, why the clsuter has apparently hung. Do you capture your console data with some console manager application ? If not, there should be at least some messages in OPERATOR.LOG - written once the 4th system came back again.&lt;BR /&gt;&lt;BR /&gt;If this would happen again - and if it really has something to do with lost quorum, you could try the IPC interrupt on the console to recalculate quorum.&lt;BR /&gt;&lt;BR /&gt;Volker.</description>
      <pubDate>Tue, 19 Jun 2007 06:26:13 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022478#M84487</guid>
      <dc:creator>Volker Halle</dc:creator>
      <dc:date>2007-06-19T06:26:13Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster suspended while one member had a defect</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022479#M84488</link>
      <description>mrityunjoy kundu,&lt;BR /&gt;&lt;BR /&gt;your formula is wrong.&lt;BR /&gt;&lt;BR /&gt;The correct formula for calculationg quorum is:&lt;BR /&gt;&lt;BR /&gt;quorum = (expected_votes+2)/2&lt;BR /&gt;&lt;BR /&gt;In this case, (4+2)/2 gives 3, which is the correct quorum value for a 4 votes.&lt;BR /&gt;&lt;BR /&gt;Volker.</description>
      <pubDate>Tue, 19 Jun 2007 11:54:41 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022479#M84488</guid>
      <dc:creator>Volker Halle</dc:creator>
      <dc:date>2007-06-19T11:54:41Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster suspended while one member had a defect</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022480#M84489</link>
      <description>hi Kirsten,&lt;BR /&gt;      check and see if the votes/expected votes etc are what you think they are. when its running, do a..&lt;BR /&gt;&lt;BR /&gt;$ show cluster/continous&lt;BR /&gt;add vote&lt;BR /&gt;add quorum&lt;BR /&gt;add cluster&lt;BR /&gt;&lt;BR /&gt;that will show what the running cluster has.&lt;BR /&gt;see if that makes sense.  Dean</description>
      <pubDate>Tue, 19 Jun 2007 12:10:08 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022480#M84489</guid>
      <dc:creator>Dean McGorrill</dc:creator>
      <dc:date>2007-06-19T12:10:08Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster suspended while one member had a defect</title>
      <link>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022481#M84490</link>
      <description>Quorum here should be 3, but the running value here could be as low as 2.  Which would be bad.&lt;BR /&gt;&lt;BR /&gt;Expected_Votes -- per the original posting -- is set incorrectly.    If connectivity is not available (due to a console configuration error or due to a partial communications disconnection), then the Expected_Votes set to 3 will result in Quorum being calculated as 2, which could then allow two disjoint partitions to operate in parallel, and with the data corruption that typically then ensues.&lt;BR /&gt;&lt;BR /&gt;If you wish to preserve the integrity of your disk data, Expected_Votes should be set to 4, and not to 3.&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://64.223.189.234/node/153" target="_blank"&gt;http://64.223.189.234/node/153&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Personally, I view the existing quorum mechanism implemented with system parameters as a design mistake.  Far too often, somebody either sets the values incorrectly, or sets the values "creatively"; deliberately and erroneously sets their configuration incorrectly.&lt;BR /&gt;&lt;BR /&gt;The central rational for existence for the cluster quorum scheme is to prevent your data from getting stomped on.  It's not something you want to mis-set, lest you allow your data to get stomped on.  And by "stomped on", I here mean "massively corrupted; how current is your BACKUP?", or such.&lt;BR /&gt;&lt;BR /&gt;Stephen Hoffman&lt;BR /&gt;HoffmanLabs LLC&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 19 Jun 2007 12:16:51 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-openvms/cluster-suspended-while-one-member-had-a-defect/m-p/4022481#M84490</guid>
      <dc:creator>Hoff</dc:creator>
      <dc:date>2007-06-19T12:16:51Z</dc:date>
    </item>
  </channel>
</rss>

