<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: one node reboot in Operating System - HP-UX</title>
    <link>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989285#M707902</link>
    <description>Change kernel param nflocks to &lt;BR /&gt;&lt;BR /&gt;10*maxusers/2&lt;BR /&gt;&lt;BR /&gt;Then you should not run into this situation&lt;BR /&gt;&lt;BR /&gt;Regards&lt;BR /&gt;&lt;BR /&gt;Rainer</description>
    <pubDate>Thu, 05 Jun 2003 04:50:55 GMT</pubDate>
    <dc:creator>Rainer von Bongartz</dc:creator>
    <dc:date>2003-06-05T04:50:55Z</dc:date>
    <item>
      <title>one node reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989281#M707898</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I have a two node cluster.&lt;BR /&gt;2* N4000/HP-UX 11.00&lt;BR /&gt;B3935DA  A.11.12        MC / Service Guard &lt;BR /&gt;&lt;BR /&gt;Yesterday about 19:45 one of the node rebooted and all the package running on it moved to the other node.&lt;BR /&gt;&lt;BR /&gt;From the OLDsyslog.log I can understand that there seems to have some problem with the samba.&lt;BR /&gt;&lt;BR /&gt;What is this error and what needs to be done?&lt;BR /&gt;OLDsyslog.log attached.</description>
      <pubDate>Thu, 05 Jun 2003 00:42:26 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989281#M707898</guid>
      <dc:creator>Sanjiv Sharma_1</dc:creator>
      <dc:date>2003-06-05T00:42:26Z</dc:date>
    </item>
    <item>
      <title>Re: one node reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989282#M707899</link>
      <description>Hi sanjiv,&lt;BR /&gt;&lt;BR /&gt;Man that's UGLY. &lt;BR /&gt;I see that as a cascade  failure. &lt;BR /&gt;First messages indicate timeouts hinting at network trouble.&lt;BR /&gt;Then the first set of errors shows that Samba couldn't open it's DB file which looks like all the world like a connection problem. Then that's reinforced by the inability to create network sockets. Then you seem to exhaust file locks - game's over. &lt;BR /&gt;That's a classic "reboot or it ain't gonna recover" scenario - hence the system paniced.&lt;BR /&gt;I'd start by asking for network logs &amp;amp; system logs from the *other* end of those connections because I see no errors for the local NIC. By that I mean this system could have well been the "victim" of severe trouble elsewhere. But of the sort where the NIC to switch link never dropped.&lt;BR /&gt;&lt;BR /&gt;My $0.02,&lt;BR /&gt;Jeff&lt;BR /&gt;</description>
      <pubDate>Thu, 05 Jun 2003 01:31:04 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989282#M707899</guid>
      <dc:creator>Jeff Schussele</dc:creator>
      <dc:date>2003-06-05T01:31:04Z</dc:date>
    </item>
    <item>
      <title>Re: one node reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989283#M707900</link>
      <description>Hi Jeff,&lt;BR /&gt;&lt;BR /&gt;Enclosed is the syslog.log of the 2nd node.</description>
      <pubDate>Thu, 05 Jun 2003 01:44:48 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989283#M707900</guid>
      <dc:creator>Sanjiv Sharma_1</dc:creator>
      <dc:date>2003-06-05T01:44:48Z</dc:date>
    </item>
    <item>
      <title>Re: one node reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989284#M707901</link>
      <description>Match up the times on those logs. I'm even more convinced that you had a BIG connection problems going on.&lt;BR /&gt;To *where* were these samba connections ? I'd bet that system or a network device in it's subnet lunched.&lt;BR /&gt;I strongly advise you also look at the Service Guard package logs on both systems for further clues. Usually located in /etc/cmcluster/pkg_name.&lt;BR /&gt;&lt;BR /&gt;Rgds,&lt;BR /&gt;Jeff</description>
      <pubDate>Thu, 05 Jun 2003 01:57:30 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989284#M707901</guid>
      <dc:creator>Jeff Schussele</dc:creator>
      <dc:date>2003-06-05T01:57:30Z</dc:date>
    </item>
    <item>
      <title>Re: one node reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989285#M707902</link>
      <description>Change kernel param nflocks to &lt;BR /&gt;&lt;BR /&gt;10*maxusers/2&lt;BR /&gt;&lt;BR /&gt;Then you should not run into this situation&lt;BR /&gt;&lt;BR /&gt;Regards&lt;BR /&gt;&lt;BR /&gt;Rainer</description>
      <pubDate>Thu, 05 Jun 2003 04:50:55 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989285#M707902</guid>
      <dc:creator>Rainer von Bongartz</dc:creator>
      <dc:date>2003-06-05T04:50:55Z</dc:date>
    </item>
    <item>
      <title>Re: one node reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989286#M707903</link>
      <description>Here's the critical part of you syslog on the 2nd node:&lt;BR /&gt;&lt;BR /&gt;Jun  4 19:30:24 ijmsia02 cmcld: Timed out node ijmsia01. It may have failed.&lt;BR /&gt;Jun  4 19:30:24 ijmsia02 cmcld: Attempting to form a new cluster&lt;BR /&gt;Jun  4 19:30:37 ijmsia02 nmbd[2331]: [2003/06/04 19:30:37, 0] nmbd/nmbd_become_lmb.c:(404)&lt;BR /&gt;Jun  4 19:30:37 ijmsia02 nmbd[2331]:   *****&lt;BR /&gt;Jun  4 19:30:37 ijmsia02 nmbd[2331]:   &lt;BR /&gt;Jun  4 19:30:37 ijmsia02 nmbd[2331]:   Samba name server IJMSIAFS01 is now a local master browser for workgroup SGP.HP.COM on subnet 15.85.28.36&lt;BR /&gt;Jun  4 19:30:45 ijmsia02 cmcld: Obtaining Cluster Lock&lt;BR /&gt;Jun  4 19:30:46 ijmsia02 cmcld: Turning off safety time protection since the cluster&lt;BR /&gt;&lt;BR /&gt;This is telling us that the second node was unable to communicate with the first via any of it's heartbeat networks - therefore it didn't know the state of the first node and a race for the cluster lock occurred. The second node won this race, so the first node was TOC'd.&lt;BR /&gt;&lt;BR /&gt;As the others have indicated, you seem to have some kind of network issue - this may be in the network itself, or on either node. My advice would be to ensure you are bang up-to-date with all network related patches on both nodes, and see if the problem persists.&lt;BR /&gt;&lt;BR /&gt;HTH&lt;BR /&gt;&lt;BR /&gt;Duncan</description>
      <pubDate>Thu, 05 Jun 2003 07:57:41 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989286#M707903</guid>
      <dc:creator>Duncan Edmonstone</dc:creator>
      <dc:date>2003-06-05T07:57:41Z</dc:date>
    </item>
    <item>
      <title>Re: one node reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989287#M707904</link>
      <description>Do you have a separate heartbeat LAN ? If not, consider bying an addtional NIC on each nod, and make HB's travel on the separate LAN too. &lt;BR /&gt;&lt;BR /&gt;Short time workaround may be to increase the heartbeat timout and heartbeat interval in the cluster config.&lt;BR /&gt;&lt;BR /&gt;Rgds Jarle&lt;BR /&gt;</description>
      <pubDate>Thu, 05 Jun 2003 08:29:41 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989287#M707904</guid>
      <dc:creator>Jarle Bjorgeengen</dc:creator>
      <dc:date>2003-06-05T08:29:41Z</dc:date>
    </item>
    <item>
      <title>Re: one node reboot</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989288#M707905</link>
      <description>Hi Sanjiv,&lt;BR /&gt;&lt;BR /&gt;Not exactly clear to me if you got the answers you needed - but as I'm just crawling out from the smoking ruins of completely the same experience (down to the tdb nagging failing locks) I'm more than happy to share..!&lt;BR /&gt;&lt;BR /&gt;Do follow the CIFS/9000 (HP-name for Samba) installation-guide - and pay *SPECIAL* attention to the newish requirements for kernel-parameters for the newer versions! (You can find the guides that correspond to your version of "CIFS/9000"/Samba on &lt;A href="http://www.docs.hp.com/hpux/netcom/index.html#CIFS/9000)" target="_blank"&gt;http://www.docs.hp.com/hpux/netcom/index.html#CIFS/9000)&lt;/A&gt; The 'rule of thumb' seems to be something like "10 times as many 'nflocks' as users and 23 times as many 'nfiles' as users".&lt;BR /&gt;&lt;BR /&gt;I'm sorry to say that our CIFS/9000-server failed miserably even if it was well inside these boundaries and had more than 35 filelocks per user at the time of the crash - it is however catering to software-developers, which could possibly translate into "LOTS of open files at any one point in time" and maybe the figures above (the factor 10 part) needs to be adjusted according to the load-type..?! (Jury's still out on that one! :-)&lt;BR /&gt;&lt;BR /&gt;Br.&lt;BR /&gt;Claus</description>
      <pubDate>Wed, 23 Jul 2003 14:27:11 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/one-node-reboot/m-p/2989288#M707905</guid>
      <dc:creator>Claus Nymann</dc:creator>
      <dc:date>2003-07-23T14:27:11Z</dc:date>
    </item>
  </channel>
</rss>

