<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: network problem starting cluster in Operating System - HP-UX</title>
    <link>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582861#M698587</link>
    <description>MC Serviceguard A.11.15.00&lt;BR /&gt;Serviceguard Extension for RAC A.11.15.00&lt;BR /&gt;&lt;BR /&gt;scancl.out is attached&lt;BR /&gt;&lt;BR /&gt;thanks .. rob&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Thu, 14 Jul 2005 10:26:56 GMT</pubDate>
    <dc:creator>Rob Payne</dc:creator>
    <dc:date>2005-07-14T10:26:56Z</dc:date>
    <item>
      <title>network problem starting cluster</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582859#M698585</link>
      <description>got past my other problem, now attempting to start the cluster, it fails with the following info from /var/adm/syslog/syslog.log:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Jul 14 12:27:06 jmar1 SAM cl adm[6569]: Start cluster jmar_cluster1 on all nodes&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmclconfd[6576]: Executing "/usr/lbin/cmcld" for node jmar1&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Logging level changed to level 0.&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Daemon Initialization - Maximum number of packages supported for this incarnation is 10.&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Global Cluster Information:&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Heartbeat Interval is 1 seconds.&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Logging level changed to level 0.&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Node Timeout is 2 seconds.&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Network Polling Interval is 2 seconds.&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Auto Start Timeout is 600 seconds.&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Information Specific to node jmar1:&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Cluster lock disk: /dev/dsk/c9t0d0.&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: lan0  0x00306e0960b2  140.139.46.121  bridged net:1&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: lan1  0x00306e08171b  10.1.1.1  bridged net:2&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Heartbeat Subnet: 10.0.0.0&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: The maximum # of concurrent local connections to the daemon that will be supported is 1014.&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Lookup of link /nodes/jmar1/networks/lan/lan1/peers failed.&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Unable to send DLPI info request, Bad file number&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: cl_abort: abort cl_kepd_printf failed: Invalid argument&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: cl_kepd_printf, fstat: kepd_fd=8, st_dev=1073741827, st_ino=446, st_rdev=-486539264&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Aborting! Failed to communicate with DLPI&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmlvmd: init_cdb_callback: starting&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: Waiting for connection request from CMGMSD&lt;BR /&gt;Jul 14 12:27:09 jmar1 cmcld: CMGMSD successfully started&lt;BR /&gt;Jul 14 12:27:12 jmar1 cmsrvassistd[6580]: The cluster daemon aborted our connection.&lt;BR /&gt;Jul 14 12:27:12 jmar1 cmsrvassistd[6580]: Lost connection with ServiceGuard cluster daemon (cmcld): Software caused connection&lt;BR /&gt; abort&lt;BR /&gt;Jul 14 12:27:12 jmar1 cmlvmd: callback_thread: Calling  process callback&lt;BR /&gt;Jul 14 12:27:12 jmar1 cmlvmd: Could not read messages from /usr/lbin/cmcld: Software caused connection abort&lt;BR /&gt;Jul 14 12:27:12 jmar1 cmlvmd: CLVMD exiting&lt;BR /&gt;Jul 14 12:27:12 jmar1 cmgmsd[6587]: The cluster daemon aborted our connection.&lt;BR /&gt;Jul 14 12:27:12 jmar1 cmgmsd[6587]: Unable to send 92 bytes (Software caused connection abort).&lt;BR /&gt;Jul 14 12:27:12 jmar1 cmclconfd[6578]: The cluster daemon aborted our connection.&lt;BR /&gt;Jul 14 12:27:12 jmar1 cmclconfd[6576]: The ServiceGuard daemon, /usr/lbin/cmcld[6577], died upon receiving signal number 6.&lt;BR /&gt;Jul 14 12:27:12 jmar1 cmclconfd[6589]: Failed to open connection to cmcld: No such file or directory&lt;BR /&gt;Jul 14 12:27:12 jmar1 cmtaped[6588]: cmtaped - failed to set up sdb callback. (ATS 1.8)&lt;BR /&gt;Jul 14 12:27:12 jmar1 cmtaped[6588]: Failed to set callback: 6004&lt;BR /&gt;Jul 14 12:28:04 jmar1 SAM cl adm[6569]: Fail to form and start cluster jmar_cluster1&lt;BR /&gt;&lt;BR /&gt;It appears be a network problem of some sort;&lt;BR /&gt;I have attached the cluster config file&lt;BR /&gt;&lt;BR /&gt;cheers .. rob</description>
      <pubDate>Thu, 14 Jul 2005 09:40:36 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582859#M698585</guid>
      <dc:creator>Rob Payne</dc:creator>
      <dc:date>2005-07-14T09:40:36Z</dc:date>
    </item>
    <item>
      <title>Re: network problem starting cluster</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582860#M698586</link>
      <description>Ok, a few questions to ask here.&lt;BR /&gt;what version of SG, and what version of SGeRAC&lt;BR /&gt;What SG and SGeRAC patches are installed?&lt;BR /&gt;Please run cmscancl and post the contents of the /tmp/scancl.out file</description>
      <pubDate>Thu, 14 Jul 2005 09:56:42 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582860#M698586</guid>
      <dc:creator>melvyn burnard</dc:creator>
      <dc:date>2005-07-14T09:56:42Z</dc:date>
    </item>
    <item>
      <title>Re: network problem starting cluster</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582861#M698587</link>
      <description>MC Serviceguard A.11.15.00&lt;BR /&gt;Serviceguard Extension for RAC A.11.15.00&lt;BR /&gt;&lt;BR /&gt;scancl.out is attached&lt;BR /&gt;&lt;BR /&gt;thanks .. rob&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 14 Jul 2005 10:26:56 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582861#M698587</guid>
      <dc:creator>Rob Payne</dc:creator>
      <dc:date>2005-07-14T10:26:56Z</dc:date>
    </item>
    <item>
      <title>Re: network problem starting cluster</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582862#M698588</link>
      <description>Well from the scancl.out, it seems you have two lan cards configured on the same subnet, which is not allowed for SG:&lt;BR /&gt;lan2*     1500 10.0.0.0        10.1.1.2&lt;BR /&gt;lan1      1500 10.0.0.0        10.1.1.1&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I would also suggest that you change the heartbeat and node timeout intervals before this goes into production, as the default settings are normally insufficient:&lt;BR /&gt;heartbeat interval:   1.00    (seconds)&lt;BR /&gt;   node timeout:   2.00    (seconds)&lt;BR /&gt;&lt;BR /&gt;I would suggest changing these to 2 and 4 seconds respectively&lt;BR /&gt;&lt;BR /&gt;One possibility I have seen before is that the CDB has got corrupted.&lt;BR /&gt;If sorting out the above network config does not fix it, you may be forced to try deleting the config, using cmdeletconf, and then recreate it&lt;BR /&gt;&lt;BR /&gt;As a final comment, you appear not have either SG or SGeRAC patched.&lt;BR /&gt;&lt;BR /&gt;to check do:&lt;BR /&gt;what /usr/lbin/cmcld |grep PHSS&lt;BR /&gt;&lt;BR /&gt;and &lt;BR /&gt;&lt;BR /&gt;what /usr/lbin/cmgmsd | grep PHSS&lt;BR /&gt;&lt;BR /&gt;If not, obtain these patches from the ITRC</description>
      <pubDate>Thu, 14 Jul 2005 11:44:42 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582862#M698588</guid>
      <dc:creator>melvyn burnard</dc:creator>
      <dc:date>2005-07-14T11:44:42Z</dc:date>
    </item>
    <item>
      <title>Re: network problem starting cluster</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582863#M698589</link>
      <description># what /usr/lbin/cmcld |grep PHSS&lt;BR /&gt;         A.11.15.00 Date: 09/16/03 Patch: PHSS_29053&lt;BR /&gt;# what /usr/lbin/cmgmsd |grep PHSS&lt;BR /&gt;         A.11.15.00 Date: 03/09/05 Patch: PHSS_32859&lt;BR /&gt;&lt;BR /&gt;This is what I got; are these the patches that are already installed, or patches that I need to install&lt;BR /&gt;&lt;BR /&gt;cheers ... rob</description>
      <pubDate>Fri, 15 Jul 2005 07:42:02 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582863#M698589</guid>
      <dc:creator>Rob Payne</dc:creator>
      <dc:date>2005-07-15T07:42:02Z</dc:date>
    </item>
    <item>
      <title>Re: network problem starting cluster</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582864#M698590</link>
      <description>ok ... added another patch to serviceguard (PHSS_32660); here is the what output:&lt;BR /&gt;&lt;BR /&gt;# what /usr/lbin/cmcld |grep PHSS&lt;BR /&gt;         A.11.15.00 Date: 03/09/05 Patch: PHSS_32660&lt;BR /&gt;# what /usr/lbin/cmgmsd |grep PHSS&lt;BR /&gt;         A.11.15.00 Date: 03/09/05 Patch: PHSS_32859&lt;BR /&gt;#</description>
      <pubDate>Fri, 15 Jul 2005 08:04:29 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582864#M698590</guid>
      <dc:creator>Rob Payne</dc:creator>
      <dc:date>2005-07-15T08:04:29Z</dc:date>
    </item>
    <item>
      <title>Re: network problem starting cluster</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582865#M698591</link>
      <description>hi rob&lt;BR /&gt;&lt;BR /&gt;Since the cmcld process aborted there should be a core file in /var/adm/cmcluster.&lt;BR /&gt;     A. Verify that the core file creation time matches the time of the dump.&lt;BR /&gt;     B. Use adb to obtain the stack from the core file:&lt;BR /&gt;&lt;BR /&gt;       # adb cmcld core&lt;BR /&gt;&lt;BR /&gt;attach the output&lt;BR /&gt;&lt;BR /&gt;regards&lt;BR /&gt;&lt;BR /&gt;vinod</description>
      <pubDate>Sun, 24 Jul 2005 23:31:14 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/network-problem-starting-cluster/m-p/3582865#M698591</guid>
      <dc:creator>vinod_25</dc:creator>
      <dc:date>2005-07-24T23:31:14Z</dc:date>
    </item>
  </channel>
</rss>

