<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Help determining cause of reboot in Operating System - HP-UX</title>
    <link>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814064#M827369</link>
    <description>Thanks Tom, that was what I needed to know.&lt;BR /&gt;&lt;BR /&gt;I had forgotten that sg will reboot if it doesn't get the lock.&lt;BR /&gt;&lt;BR /&gt;As a side question, is there a way to give one node priority on the lock over the other?  This company would prefer that one of the two machines be the primary node virtually all the time.  And ALL of the failovers that they have had were the result of network problems.  So the primary server was always working, and always available to run the package.&lt;BR /&gt;&lt;BR /&gt;But it seems that on every failover the alternate machines gets the lock first and we and up halting the package on that node and bringing it back up on the primary machine.&lt;BR /&gt;&lt;BR /&gt;It would be nice if we could set some type of priority to give the primary the first shot at the lock.  Say like a 10 second delay on the alternate or something like that.&lt;BR /&gt;</description>
    <pubDate>Thu, 26 Sep 2002 13:02:40 GMT</pubDate>
    <dc:creator>Sean OB_1</dc:creator>
    <dc:date>2002-09-26T13:02:40Z</dc:date>
    <item>
      <title>Help determining cause of reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814060#M827365</link>
      <description>Hello.&lt;BR /&gt;&lt;BR /&gt;One of our client sites had some power outtages yesterday, approximately 4 in 5 minutes.  The servers are all supposed to be on UPS supplied circuits.&lt;BR /&gt;&lt;BR /&gt;After the outtages the servers did remain running.  However about 1 minute after the last noticeable outtage one of the servers rebooted.&lt;BR /&gt;&lt;BR /&gt;This server is part of a service guard 2 node cluster.  It was the primary node at the time of the outtage.   &lt;BR /&gt;&lt;BR /&gt;Can someone take a look at the log file entries below and help me determine why the machine rebooted?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Sep 25 14:40:12 cosmo0 : su : + tmc voecksl-root&lt;BR /&gt;Sep 25 16:39:16 cosmo0 telnetd[20105]: getpid: peer died: Connection timed out&lt;BR /&gt;Sep 25 16:40:12 cosmo0 telnetd[20163]: getpid: peer died: Connection timed out&lt;BR /&gt;Sep 25 16:40:12 cosmo0 telnetd[20164]: getpid: peer died: Connection timed out&lt;BR /&gt;Sep 25 16:40:12 cosmo0 telnetd[20165]: getpid: peer died: Connection timed out&lt;BR /&gt;Sep 25 16:40:29 cosmo0 vmunix: btlan: NOTE: MII Link Status Not OK - Check Cable&lt;BR /&gt; Connection to Hub/Switch at 0/2/0/0/5/0....&lt;BR /&gt;Sep 25 16:40:29 cosmo0 vmunix: btlan: NOTE: MII Link Status Not OK - Check Cable&lt;BR /&gt; Connection to Hub/Switch at 0/5/0/0/5/0....&lt;BR /&gt;Sep 25 16:40:29 cosmo0 cmcld: lan2 failed&lt;BR /&gt;Sep 25 16:40:29 cosmo0 cmcld: Subnet 148.8.70.0 switched from lan2 to lan3&lt;BR /&gt;Sep 25 16:40:29 cosmo0 cmcld: lan2 switched to lan3&lt;BR /&gt;Sep 25 16:40:29 cosmo0 cmcld: lan6 failed&lt;BR /&gt;Sep 25 16:40:29 cosmo0 cmcld: Package unidata cannot run on this node because sw&lt;BR /&gt;itching has been disabled for this node.&lt;BR /&gt;Sep 25 16:40:31 cosmo0 vmunix: btlan: NOTE: MII Link Status Not OK - Check Cable&lt;BR /&gt; Connection to Hub/Switch at 0/2/0/0/6/0....&lt;BR /&gt;Sep 25 16:40:31 cosmo0 cmcld: lan3 failed&lt;BR /&gt;Sep 25 16:40:31 cosmo0 cmcld: Subnet 148.8.70.0 down&lt;BR /&gt;Sep 25 16:41:39 cosmo0 cmcld: Timed out node cosmo1. It may have failed.&lt;BR /&gt;Sep 25 16:41:39 cosmo0 cmcld: Attempting to form a new cluster&lt;BR /&gt;Sep 25 16:45:01 cosmo0 cmcld: lan2 recovered&lt;BR /&gt;Sep 25 16:45:01 cosmo0 cmcld: Subnet 148.8.70.0 switched from lan3 to lan2&lt;BR /&gt;Sep 25 16:45:01 cosmo0 cmcld: lan3 switched to lan2&lt;BR /&gt;Sep 25 16:45:01 cosmo0 cmcld: Subnet 148.8.70.0 up&lt;BR /&gt;Sep 25 16:45:01 cosmo0 cmcld: Package unidata cannot run on this node because sw&lt;BR /&gt;itching has been disabled for this node.&lt;BR /&gt;Sep 25 16:45:03 cosmo0 cmcld: lan6 recovered&lt;BR /&gt;Sep 25 16:46:41 cosmo0 cmcld: Obtaining Cluster Lock&lt;BR /&gt;Sep 25 16:46:42 cosmo0 cmcld: Cluster lock was denied. Lock was obtained by anot&lt;BR /&gt;her node.&lt;BR /&gt;Sep 25 16:46:42 cosmo0 cmcld: Attempting to form a new cluster&lt;BR /&gt;Sep 25 16:46:42 cosmo0 cmcld: Daemon exiting due to halt message from node cosmo&lt;BR /&gt;1&lt;BR /&gt;Sep 25 16:46:42 cosmo0 cmcld: Halting cosmo0 to preserve data integrity&lt;BR /&gt;Sep 25 16:46:42 cosmo0 cmcld: Reason: Impossibly long daemon hang detected&lt;BR /&gt;Sep 25 16:46:42 cosmo0 cmcld: cl_abort: abort cl_kepd_printf failed: Invalid arg&lt;BR /&gt;ument&lt;BR /&gt;Sep 25 16:46:42 cosmo0 cmcld: Aborting! Impossibly long daemon hang detected (fi&lt;BR /&gt;le: utils.c, line: 155)&lt;BR /&gt;Sep 25 16:46:46 cosmo0 cmclconfd[2596]: The ServiceGuard daemon, /usr/lbin/cmcld&lt;BR /&gt;[2597], died upon receiving the signal 6.&lt;BR /&gt;Sep 25 16:46:53 cosmo0 vmunix:&lt;BR /&gt;Sep 25 16:46:53 cosmo0 vmunix: sync'ing disks (15 buffers to flush): 15 4 1&lt;BR /&gt;Sep 25 16:46:53 cosmo0 vmunix: 0 buffers not flushed&lt;BR /&gt;Sep 25 16:46:53 cosmo0 vmunix: 0 buffers still dirty&lt;BR /&gt;root@cosmo0:/var/adm/syslog-&amp;gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 26 Sep 2002 12:50:36 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814060#M827365</guid>
      <dc:creator>Sean OB_1</dc:creator>
      <dc:date>2002-09-26T12:50:36Z</dc:date>
    </item>
    <item>
      <title>Re: Help determining cause of reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814061#M827366</link>
      <description>Sounds like the switch or hub the lan cards are attached to lost power and resulted in loss of lan connectivity.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 26 Sep 2002 12:54:22 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814061#M827366</guid>
      <dc:creator>Tom Danzig</dc:creator>
      <dc:date>2002-09-26T12:54:22Z</dc:date>
    </item>
    <item>
      <title>Re: Help determining cause of reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814062#M827367</link>
      <description>I should have added that in a two node cluster, if connectivity between the nodes stops and they are both up, whichever node gets the cluster lock VG will stay up.  The other node will panic and reboot.  &lt;BR /&gt;&lt;BR /&gt;Sounds like that's what happened here.  This node lost the race to the lock VG.</description>
      <pubDate>Thu, 26 Sep 2002 12:57:12 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814062#M827367</guid>
      <dc:creator>Tom Danzig</dc:creator>
      <dc:date>2002-09-26T12:57:12Z</dc:date>
    </item>
    <item>
      <title>Re: Help determining cause of reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814063#M827368</link>
      <description>Sorry forgot to add the following.&lt;BR /&gt;&lt;BR /&gt;The datacenter and main switches are UPS powered.  However the switches in the closets throughout the campus are not.&lt;BR /&gt;&lt;BR /&gt;So when we lost power all of the external switches reboot and try to re-establish connectivity to the main bridge switches.&lt;BR /&gt;&lt;BR /&gt;The way they have things set up, if there are successive outtages like this in a quick period the main bridges get overloaded and fail, requiring a reboot of them.&lt;BR /&gt;&lt;BR /&gt;So while the center is UPS'd this type of failure does cause the servers to lose their lan while the main bridges are rebooting.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Would serviceguard for any reason reboot the server when it sees a lan failure?&lt;BR /&gt;&lt;BR /&gt;TIA,&lt;BR /&gt;&lt;BR /&gt;Sean</description>
      <pubDate>Thu, 26 Sep 2002 12:58:18 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814063#M827368</guid>
      <dc:creator>Sean OB_1</dc:creator>
      <dc:date>2002-09-26T12:58:18Z</dc:date>
    </item>
    <item>
      <title>Re: Help determining cause of reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814064#M827369</link>
      <description>Thanks Tom, that was what I needed to know.&lt;BR /&gt;&lt;BR /&gt;I had forgotten that sg will reboot if it doesn't get the lock.&lt;BR /&gt;&lt;BR /&gt;As a side question, is there a way to give one node priority on the lock over the other?  This company would prefer that one of the two machines be the primary node virtually all the time.  And ALL of the failovers that they have had were the result of network problems.  So the primary server was always working, and always available to run the package.&lt;BR /&gt;&lt;BR /&gt;But it seems that on every failover the alternate machines gets the lock first and we and up halting the package on that node and bringing it back up on the primary machine.&lt;BR /&gt;&lt;BR /&gt;It would be nice if we could set some type of priority to give the primary the first shot at the lock.  Say like a 10 second delay on the alternate or something like that.&lt;BR /&gt;</description>
      <pubDate>Thu, 26 Sep 2002 13:02:40 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814064#M827369</guid>
      <dc:creator>Sean OB_1</dc:creator>
      <dc:date>2002-09-26T13:02:40Z</dc:date>
    </item>
    <item>
      <title>Re: Help determining cause of reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814065#M827370</link>
      <description>Hi Sean,&lt;BR /&gt;&lt;BR /&gt;Yes. MC/ServiceGuard TOC's the node that is not having the cluster lock but having the volume groups activated during the cluster reformation. Your cosmo0 lost the cluster lock to cosmo1 during the outage.&lt;BR /&gt;&lt;BR /&gt;Go through the messages and you will get it crystal clear. &lt;BR /&gt;&lt;BR /&gt;Sep 25 16:46:41 cosmo0 cmcld: Obtaining Cluster Lock &lt;BR /&gt;Sep 25 16:46:42 cosmo0 cmcld: Cluster lock was denied. Lock was obtained by anot &lt;BR /&gt;her node. &lt;BR /&gt;Sep 25 16:46:42 cosmo0 cmcld: Attempting to form a new cluster &lt;BR /&gt;Sep 25 16:46:42 cosmo0 cmcld: Daemon exiting due to halt message from node cosmo &lt;BR /&gt;1 &lt;BR /&gt;Sep 25 16:46:42 cosmo0 cmcld: Halting cosmo0 to preserve data integrity &lt;BR /&gt;&lt;BR /&gt;-Sri</description>
      <pubDate>Thu, 26 Sep 2002 13:08:13 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814065#M827370</guid>
      <dc:creator>Sridhar Bhaskarla</dc:creator>
      <dc:date>2002-09-26T13:08:13Z</dc:date>
    </item>
    <item>
      <title>Re: Help determining cause of reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814066#M827371</link>
      <description>Sean,&lt;BR /&gt;&lt;BR /&gt;Look at NODE_TIMEOUT and NETWORK_POLLING_INTERVAL in cluster's ascii file. The first one determines how long it should wait to reform the cluster when it finds that the other node is timed out and the second one is to decide when to call it a network outage and is particularly helpful for local lan failovers.&lt;BR /&gt;&lt;BR /&gt;You can increase these values. My settings are 12 secs for both.&lt;BR /&gt;&lt;BR /&gt;-Sri</description>
      <pubDate>Thu, 26 Sep 2002 13:14:06 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814066#M827371</guid>
      <dc:creator>Sridhar Bhaskarla</dc:creator>
      <dc:date>2002-09-26T13:14:06Z</dc:date>
    </item>
    <item>
      <title>Re: Help determining cause of reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814067#M827372</link>
      <description>Sean,&lt;BR /&gt;&lt;BR /&gt;NO!  There is no way to force one node to have any advantage!  I was a bit peeved about this myself when  brought this up in the MC/SG class which I attended about 2 months ago.  Seem like HP could put some delay mechanism in place to give one node an advantage.  Alas, the is nothing you can do (at least that's what my instructor said).</description>
      <pubDate>Thu, 26 Sep 2002 13:16:29 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/help-determining-cause-of-reboot/m-p/2814067#M827372</guid>
      <dc:creator>Tom Danzig</dc:creator>
      <dc:date>2002-09-26T13:16:29Z</dc:date>
    </item>
  </channel>
</rss>

