<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Server Hangs every 3 months in Operating System - Linux</title>
    <link>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149208#M31708</link>
    <description>Does the server have any scheduled job at that interval?</description>
    <pubDate>Fri, 29 Feb 2008 15:55:31 GMT</pubDate>
    <dc:creator>Avijit Patra</dc:creator>
    <dc:date>2008-02-29T15:55:31Z</dc:date>
    <item>
      <title>Server Hangs every 3 months</title>
      <link>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149203#M31703</link>
      <description>&lt;!--!*#--&gt;We have a few servers (1 highly visible) that are locking up every 3 months. The entire system becomes unresponsive and we must reboot. I have seen an issue in the past with dell hardware and DRAC cards, where they do a firmware restart and this causes the system to hang (was fixed with a firmware upgrade / kernel upgrade)&lt;BR /&gt;&lt;BR /&gt;We are using ILO cards in these servers and I am not sure if this is the culprit or not.. There is nothing in the logs that show any sort of problem.&lt;BR /&gt;&lt;BR /&gt;uname -a output&lt;BR /&gt;&lt;BR /&gt;Linux ########## 2.6.5-7.286-bigsmp #1 SMP Thu May 31 10:12:58 UTC 2007 i686 athlon i386 GNU/Linux&lt;BR /&gt;&lt;BR /&gt;anyone have any ideas, both Novell and HP are unable to come up with anything.&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Larry</description>
      <pubDate>Fri, 22 Feb 2008 19:55:23 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149203#M31703</guid>
      <dc:creator>Larry UofM</dc:creator>
      <dc:date>2008-02-22T19:55:23Z</dc:date>
    </item>
    <item>
      <title>Re: Server Hangs every 3 months</title>
      <link>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149204#M31704</link>
      <description>You should enable the magic sysrq key and try to force a memory dump.&lt;BR /&gt;&lt;BR /&gt;When I had a similar problem I configured a remote syslog server because the system was hang and cannot write to disk, but was able to send the message over the network and more infor was obtained to troubleshoot the problem. &lt;BR /&gt;&lt;BR /&gt;Install collectl and enable performance logging. You could have an idea of what was going on in the system at the time of the hang.</description>
      <pubDate>Sat, 23 Feb 2008 23:08:08 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149204#M31704</guid>
      <dc:creator>Ivan Ferreira</dc:creator>
      <dc:date>2008-02-23T23:08:08Z</dc:date>
    </item>
    <item>
      <title>Re: Server Hangs every 3 months</title>
      <link>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149205#M31705</link>
      <description>As a test, you can simulate a crash with the Sysrq facility.  You can test this by enabling sysrq and following this article:&lt;BR /&gt;&lt;A href="http://kbase.redhat.com/faq/FAQ_80_5559.shtm" target="_blank"&gt;http://kbase.redhat.com/faq/FAQ_80_5559.shtm&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;The 'c' character will simulate a crash.  Time the core creation so that this will give you a guideline if there is a crash again.  Do not manually reboot until AFTER the core is created.</description>
      <pubDate>Mon, 25 Feb 2008 01:40:48 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149205#M31705</guid>
      <dc:creator>skt_skt</dc:creator>
      <dc:date>2008-02-25T01:40:48Z</dc:date>
    </item>
    <item>
      <title>Re: Server Hangs every 3 months</title>
      <link>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149206#M31706</link>
      <description>Aside from the answers provided - have you looked over the sar data?  Could it be something as simple as a filesystem filling up with temp data and killing the host?</description>
      <pubDate>Tue, 26 Feb 2008 12:34:20 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149206#M31706</guid>
      <dc:creator>Don Vanco - Linux Ninja</dc:creator>
      <dc:date>2008-02-26T12:34:20Z</dc:date>
    </item>
    <item>
      <title>Re: Server Hangs every 3 months</title>
      <link>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149207#M31707</link>
      <description>Here is the stack I got yesterday... oddly enough I am getting crashes allot more often now.&lt;BR /&gt;&lt;BR /&gt; Unable to handle kernel NULL pointer dereference at virtual address 00000174&lt;BR /&gt; printing eip:&lt;BR /&gt; c015ff0f&lt;BR /&gt;*pde = 21d58001&lt;BR /&gt;Oops: 0000 [#1]&lt;BR /&gt;SMP&lt;BR /&gt;CPU:    0&lt;BR /&gt;EIP:    0060:[&lt;C015FF0F&gt;]    Tainted: PF  U&lt;BR /&gt;EFLAGS: 00010286   (2.6.5-7.287.3-bigsmp SLES9_SP3_BRANCH-20071002073136)&lt;BR /&gt;EIP is at blk_queue_bounce+0xf/0x310&lt;BR /&gt;eax: 00000000   ebx: f510dc68   ecx: 00000000   edx: d1bb5b44&lt;BR /&gt;esi: 00000000   edi: 00000000   ebp: 00000008   esp: d1bb5af0&lt;BR /&gt;ds: 007b   es: 007b   ss: 0068&lt;BR /&gt;Process novell-zislnxd (pid: 22349, threadinfo=d1bb4000 task=f2769980)&lt;BR /&gt;Stack: 00000001 00000001 cdfd2c50 ca852720 00000003 dabbee48 d1bb5b44 00000000&lt;BR /&gt;       f510dc68 00000000 00000000 00000008 c026e21b f510de04 00000046 00000000&lt;BR /&gt;       00000000 00000000 00000008 00000008 faa570a0 f510dc68 f510dc68 f510dc04&lt;BR /&gt;Call Trace:&lt;BR /&gt; [&lt;C026E21B&gt;] __make_request+0x4b/0x530&lt;BR /&gt; [&lt;FAA570A0&gt;] MpcPathWeightForAdaptive+0x0/0x130 [emcpmpc]&lt;BR /&gt; [&lt;FA98C073&gt;] PowerPlatformBottomDispatch+0x3b3/0x470 [emcp]&lt;BR /&gt; [&lt;FAA68680&gt;] MpcDispatchGuts+0xb0/0xc0 [emcpmpc]&lt;BR /&gt; [&lt;FA98DBCB&gt;] PowerTopDispatch+0x10b/0x320 [emcp]&lt;BR /&gt; [&lt;FA982772&gt;] allocPio+0x12/0x20 [emcp]&lt;BR /&gt; [&lt;FA98DE36&gt;] emcp_native_mrf+0x56/0x90 [emcp]&lt;BR /&gt; [&lt;C026D05D&gt;] generic_make_request+0x11d/0x200&lt;BR /&gt; [&lt;C0154614&gt;] mempool_alloc+0x74/0x130&lt;BR /&gt; [&lt;C012A810&gt;] autoremove_wake_function+0x0/0x40&lt;BR /&gt; [&lt;C026D1A3&gt;] submit_bio+0x63/0x120&lt;BR /&gt; [&lt;C012A810&gt;] autoremove_wake_function+0x0/0x40&lt;BR /&gt; [&lt;C017BA52&gt;] bio_alloc+0xe2/0x1d0&lt;BR /&gt; [&lt;C0177B2D&gt;] submit_bh+0x17d/0x220&lt;BR /&gt; [&lt;C0179CA7&gt;] block_read_full_page+0x367/0x370&lt;BR /&gt; [&lt;C017DFB0&gt;] blkdev_get_block+0x0/0x80&lt;BR /&gt; [&lt;C0150037&gt;] add_to_page_cache+0x57/0x180&lt;BR /&gt; [&lt;C01583D0&gt;] read_pages+0x130/0x1b0&lt;BR /&gt; [&lt;C0156BA4&gt;] __alloc_pages+0xb4/0x430&lt;BR /&gt; [&lt;C015870B&gt;] blockable_page_cache_readahead+0x12b/0x1a0&lt;BR /&gt; [&lt;C01589C3&gt;] page_cache_readahead+0x243/0x300&lt;BR /&gt; [&lt;C015143C&gt;] do_generic_mapping_read+0x41c/0x7d0&lt;BR /&gt; [&lt;C011A529&gt;] flush_tlb_page+0x59/0xe0&lt;BR /&gt; [&lt;C0152462&gt;] __generic_file_aio_read+0x1e2/0x220&lt;BR /&gt; [&lt;C014F920&gt;] file_read_actor+0x0/0xf0&lt;BR /&gt; [&lt;C01525E8&gt;] generic_file_read+0x98/0xc0&lt;BR /&gt; [&lt;C012A810&gt;] autoremove_wake_function+0x0/0x40&lt;BR /&gt; [&lt;C012D315&gt;] sys_wait4+0x195/0x5c0&lt;BR /&gt; [&lt;C018B420&gt;] __pollwait+0x0/0x120&lt;BR /&gt; [&lt;C0176536&gt;] vfs_read+0xc6/0x160&lt;BR /&gt; [&lt;C01767E1&gt;] sys_read+0x91/0xf0&lt;BR /&gt; [&lt;C01091B9&gt;] sysenter_past_esp+0x52/0x71&lt;BR /&gt;&lt;BR /&gt;Code: f6 80 74 01 00 00 01 0f 85 e4 01 00 00 8b 54 24 1c a1 c8 8f&lt;BR /&gt;done waiting: 3 cpus not responding&lt;BR /&gt;Dumping to block device (104,1) on CPU 0 ...&lt;/C01091B9&gt;&lt;/C01767E1&gt;&lt;/C0176536&gt;&lt;/C018B420&gt;&lt;/C012D315&gt;&lt;/C012A810&gt;&lt;/C01525E8&gt;&lt;/C014F920&gt;&lt;/C0152462&gt;&lt;/C011A529&gt;&lt;/C015143C&gt;&lt;/C01589C3&gt;&lt;/C015870B&gt;&lt;/C0156BA4&gt;&lt;/C01583D0&gt;&lt;/C0150037&gt;&lt;/C017DFB0&gt;&lt;/C0179CA7&gt;&lt;/C0177B2D&gt;&lt;/C017BA52&gt;&lt;/C012A810&gt;&lt;/C026D1A3&gt;&lt;/C012A810&gt;&lt;/C0154614&gt;&lt;/C026D05D&gt;&lt;/FA98DE36&gt;&lt;/FA982772&gt;&lt;/FA98DBCB&gt;&lt;/FAA68680&gt;&lt;/FA98C073&gt;&lt;/FAA570A0&gt;&lt;/C026E21B&gt;&lt;/C015FF0F&gt;</description>
      <pubDate>Wed, 27 Feb 2008 18:43:14 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149207#M31707</guid>
      <dc:creator>Larry UofM</dc:creator>
      <dc:date>2008-02-27T18:43:14Z</dc:date>
    </item>
    <item>
      <title>Re: Server Hangs every 3 months</title>
      <link>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149208#M31708</link>
      <description>Does the server have any scheduled job at that interval?</description>
      <pubDate>Fri, 29 Feb 2008 15:55:31 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149208#M31708</guid>
      <dc:creator>Avijit Patra</dc:creator>
      <dc:date>2008-02-29T15:55:31Z</dc:date>
    </item>
    <item>
      <title>Re: Server Hangs every 3 months</title>
      <link>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149209#M31709</link>
      <description>I agree with the previous reply about running collectl - probably because I wrote it.  9-)&lt;BR /&gt;&lt;BR /&gt;The important thing to remember if you run collectl or even sar is to have a farily high monitoring frequency and I know most sar users monitor once every 10 minutes.  By default collectl monitors once every 10 seconds, but even at that frequency it typically uses &amp;lt;0.1% of the cpu.&lt;BR /&gt;&lt;BR /&gt;Once you've collected a pile of data with it you can then play it back and look at a variety of data in a variety of formats showing most of the types of things sar shows and then some.  The key things I'd look for are system resources that are going up in consumption as well as what was going on at the time of the 'lock up', assuming you know the approximate time.&lt;BR /&gt;&lt;BR /&gt;One resource people often miss (probably because there aren't any other utilities I know of that will log their usage) is 'slabs'.  Collectl will show you the amount of memory allocated to slabs when you show memory usage but if in fact you think you are seeing an issue, you can also look at changes over time to individual slabs.  Since slab monitoring (and process monitoring too for that matter) are more expensive to monitor than the other types of data, those subsystems are monitored once a minute in order to stay within that &amp;lt;0.1% overhead window.&lt;BR /&gt;&lt;BR /&gt;Just keep in mind that by default collectl will write its data to a log in /var/log/collect, creating a new log every day and retaining 7 previous ones.  If you do need to keep more, you can modify the number in /etc/collectl.conf.&lt;BR /&gt;&lt;BR /&gt;check it out at &lt;A href="http://collectl.sourceforge.net/" target="_blank"&gt;http://collectl.sourceforge.net/&lt;/A&gt; and enjoy&lt;BR /&gt;&lt;BR /&gt;-mark</description>
      <pubDate>Sun, 02 Mar 2008 12:16:40 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149209#M31709</guid>
      <dc:creator>MarkSeger</dc:creator>
      <dc:date>2008-03-02T12:16:40Z</dc:date>
    </item>
    <item>
      <title>Re: Server Hangs every 3 months</title>
      <link>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149210#M31710</link>
      <description>&amp;gt;&amp;gt;&amp;gt;&amp;gt; Process novell-zislnxd (pid: 22349, threadinfo=d1bb4000 task=f2769980)&lt;BR /&gt;&lt;BR /&gt;Contact Novell support. It looks like a problem in Novell ZENworks Linux Management.</description>
      <pubDate>Tue, 04 Mar 2008 12:59:16 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149210#M31710</guid>
      <dc:creator>Ivan Ferreira</dc:creator>
      <dc:date>2008-03-04T12:59:16Z</dc:date>
    </item>
    <item>
      <title>Re: Server Hangs every 3 months</title>
      <link>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149211#M31711</link>
      <description>According to novell this is a problem with EMC PowerPath not zlm... Oddly enough this also happens to servers not running powerpath, I have yet to be able to get a core dump from these servers (they crash less often than the others)</description>
      <pubDate>Tue, 04 Mar 2008 13:42:22 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/server-hangs-every-3-months/m-p/4149211#M31711</guid>
      <dc:creator>Larry UofM</dc:creator>
      <dc:date>2008-03-04T13:42:22Z</dc:date>
    </item>
  </channel>
</rss>

