<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: rhel as 4 update 1 or 2 random crashes in Operating System - Linux</title>
    <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651466#M20154</link>
    <description>Hplog -v output contains only strings:&lt;BR /&gt;ASR Detected by System ROM</description>
    <pubDate>Mon, 31 Oct 2005 02:44:24 GMT</pubDate>
    <dc:creator>Boris Kulikov</dc:creator>
    <dc:date>2005-10-31T02:44:24Z</dc:date>
    <item>
      <title>rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651464#M20152</link>
      <description>Hello!&lt;BR /&gt;&lt;BR /&gt;Random crashes on the few linux x86_64 proliant DL380G4 RHEL AS with update 1 or update 2 without any messages in syslog and without crash dumps. (crash dump not possible).&lt;BR /&gt;&lt;BR /&gt;On both machines latest proliant support pack are installed. I'm use bcm5700 network driver instead of tg3.&lt;BR /&gt;&lt;BR /&gt;See crash log at ilo console in attachment.&lt;BR /&gt;&lt;BR /&gt;While investigating this issue I found:&lt;BR /&gt;&lt;BR /&gt;1) strange entry in dmesg while hpasm start:&lt;BR /&gt;(service hpasm start or service hpasm start hpasmd)&lt;BR /&gt;&lt;BR /&gt;Losing some ticks... checking if CPU frequency changed.&lt;BR /&gt;warning: many lost ticks.&lt;BR /&gt;Your time source seems to be instable or some driver is hogging interupts&lt;BR /&gt;rip __do_softirq+0x4d/0xd0&lt;BR /&gt;&lt;BR /&gt;It looks similar with the do_softirq crash entry in the saved ilo crash log.&lt;BR /&gt;&lt;BR /&gt;This messages appear in dmesg after hpasmd daemon start with or without "notaint" option in /opt/compaq/cma.conf.&lt;BR /&gt;&lt;BR /&gt;2) It look strange for me that this user-level daemon can do such a bad thing, but when I do "file" on it - it looks very old and not a 64 bit binary.&lt;BR /&gt;&lt;BR /&gt;file /opt/compaq/hpasmd/bin/hpasmd&lt;BR /&gt;/opt/compaq/hpasmd/bin/hpasmd: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), stripped&lt;BR /&gt;&lt;BR /&gt;This issue must be investigated and fixed.&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;Boris</description>
      <pubDate>Tue, 18 Oct 2005 04:08:51 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651464#M20152</guid>
      <dc:creator>Boris Kulikov</dc:creator>
      <dc:date>2005-10-18T04:08:51Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651465#M20153</link>
      <description>Your number 1 is well understood and a report has been sent to Redhat for them to address.  This is really not a problem because the kernel catches itself up.  There is nothing wrong with the HW, Redhat is just making a bad guess.&lt;BR /&gt;&lt;BR /&gt;Your number 2 is correct.  Even on the x86-64 system, hpasmd and all the other agent in the hpasm package are 32bit apps.&lt;BR /&gt;&lt;BR /&gt;As for the randam crashes.  Please provide a little more info.  It might be helpfull to provide the output of "hplog -v"</description>
      <pubDate>Wed, 26 Oct 2005 13:19:07 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651465#M20153</guid>
      <dc:creator>Shannon_44</dc:creator>
      <dc:date>2005-10-26T13:19:07Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651466#M20154</link>
      <description>Hplog -v output contains only strings:&lt;BR /&gt;ASR Detected by System ROM</description>
      <pubDate>Mon, 31 Oct 2005 02:44:24 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651466#M20154</guid>
      <dc:creator>Boris Kulikov</dc:creator>
      <dc:date>2005-10-31T02:44:24Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651467#M20155</link>
      <description>have you tried booting the system but without apic support? add that option to lilo or grub and reboot.&lt;BR /&gt;&lt;BR /&gt;my guess apic is screwing up your system.</description>
      <pubDate>Mon, 31 Oct 2005 08:54:47 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651467#M20155</guid>
      <dc:creator>dirk dierickx</dc:creator>
      <dc:date>2005-10-31T08:54:47Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651468#M20156</link>
      <description>The ASR(Automatic Server Recovery) means that the hpasmd for some reason cannot update the count down timer.  Either hpasmd is being killed or the system is hanging.  This feature can be turned off in the ROM Based Setup Utility(RBSU) or by "hplog -a DISABLE" at the OS level.  &lt;BR /&gt;&lt;BR /&gt;Try turning ASR OFF and see if you notice any system hangs or hpasmd dieing. If you don't see either of those situations then maybe try increasing the timeout to ten minutes.  It might be possible that another process is taking up so much CPU time that hpasmd doesn't get a chance to update the counter.</description>
      <pubDate>Mon, 31 Oct 2005 10:25:38 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651468#M20156</guid>
      <dc:creator>Shannon_44</dc:creator>
      <dc:date>2005-10-31T10:25:38Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651469#M20157</link>
      <description>No success.&lt;BR /&gt;Server crashed with or without ASR.&lt;BR /&gt;hpasmd works fine in all cases.</description>
      <pubDate>Wed, 02 Nov 2005 09:50:40 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651469#M20157</guid>
      <dc:creator>Boris Kulikov</dc:creator>
      <dc:date>2005-11-02T09:50:40Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651470#M20158</link>
      <description>Do you get the panic when the hpasm package is not installed?  What HW devices do you have in the system(ie HBA, Nics, etc)?  What SW are you running?   Is this a standard Red Hat kernel?&lt;BR /&gt;&lt;BR /&gt;thanks&lt;BR /&gt;Shannon</description>
      <pubDate>Wed, 02 Nov 2005 10:46:31 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651470#M20158</guid>
      <dc:creator>Shannon_44</dc:creator>
      <dc:date>2005-11-02T10:46:31Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651471#M20159</link>
      <description>- Do you get the panic when the hpasm package is not installed?&lt;BR /&gt;- Yes (when not running hpasm packages).&lt;BR /&gt;What HW devices do you have in the system(ie HBA, Nics, etc)? &lt;BR /&gt;DL380G4 devices and HBA FCA2214. Two CPU, 4 GB memory.&lt;BR /&gt;&lt;BR /&gt;- What SW are you running? &lt;BR /&gt;RHEL 4 AS Update 1&lt;BR /&gt;HP Proliant Support Pack 7.40&lt;BR /&gt;Cyrus IMAP server&lt;BR /&gt;DHCP server&lt;BR /&gt;Apache WEB server&lt;BR /&gt;SQUID proxy server&lt;BR /&gt;OpenAFS client and server&lt;BR /&gt;- Is this a standard Red Hat kernel?&lt;BR /&gt;Yes.</description>
      <pubDate>Thu, 03 Nov 2005 05:50:06 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651471#M20159</guid>
      <dc:creator>Boris Kulikov</dc:creator>
      <dc:date>2005-11-03T05:50:06Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651472#M20160</link>
      <description>I have similar crashes on one of my DL385s.&lt;BR /&gt;&lt;BR /&gt;Details:&lt;BR /&gt;Server: Proliant DL385 (G1)&lt;BR /&gt;CPU: 1x Opteron 275 dual-core&lt;BR /&gt;Memory: 16 GB&lt;BR /&gt;OS: RedHat ES4 Update 3 (64-bit x86_64 version)&lt;BR /&gt;Kernel version: 2.6.9-34.0.1.ELsmp&lt;BR /&gt;&lt;BR /&gt;After the restart, the server generally runs just fine for a day, then server is usually found hung on the next morning. (Actually not quite hung: it still responds to pings, but does not accept network connections. The console is frozen too.)&lt;BR /&gt;&lt;BR /&gt;Our application development guys seem to be running performance tests at night-time, so server load might be a factor.&lt;BR /&gt;&lt;BR /&gt;The 32-bit versions of RedHat ES4 don't suffer from this problem: for application support reasons, we are running some DL385s with 64-bit RHES4 and some with 32-bit RHES4.&lt;BR /&gt;&lt;BR /&gt;I did some googling on this: based on comments on the Linux-kernel mailing list, it looks like the problem might be with AMD 8111 chipset (inaccurate timer implementation?) and/or the fact that Opteron dynamically changes the CPU frequency according to the needs. The hpasmd daemon might be only marginally related.&lt;BR /&gt;&lt;BR /&gt;In the 32-bit Linux kernel, there are several options for OS real-time clock, because of the history of PC hardware evolution. There are fallback mechanisms in case the "best" available method is found unreliable.&lt;BR /&gt;&lt;BR /&gt;In the x86_64 Linux kernel, some of these fallback mechanisms are different or not implemented... probably because it was assumed that a server with 64-bit CPU would not need to fall all the way back to original IBM PC/AT timer technology :-)&lt;BR /&gt;&lt;BR /&gt;The code in question can be found in Linux kernel source: &lt;BR /&gt;&lt;KERNEL source=""&gt;/arch/i386/kernel/time.c&lt;BR /&gt;and&lt;BR /&gt;&lt;KERNEL source=""&gt;/arch/x86_64/kernel/time.c&lt;BR /&gt;&lt;BR /&gt;Some of those timing methods need to be aware of CPU speed changes, some use a hardware timer in the Power Management subsystem or something similar.&lt;BR /&gt;&lt;BR /&gt;&lt;/KERNEL&gt;&lt;/KERNEL&gt;</description>
      <pubDate>Wed, 07 Jun 2006 06:20:43 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651472#M20160</guid>
      <dc:creator>Matti_Kurkela</dc:creator>
      <dc:date>2006-06-07T06:20:43Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651473#M20161</link>
      <description>Update: I tried running my DL385 with kernel 2.6.16.20, which should fix the "warning: many lost ticks" problem... and sure enough, I did not see that message anymore.&lt;BR /&gt;&lt;BR /&gt;However, my DL385 keeps crashing still, and the crashes seem to be getting more frequent, regardless of which kernel I'm using.&lt;BR /&gt;&lt;BR /&gt;Now I'm beginning to suspect a faulty CPU.&lt;BR /&gt;&lt;BR /&gt;One of our Proliant DL385s with a 32-bit RHES4 had a similar situation: the CPU seemed fine on low load, but running a performance test brought the machine down consistently within 10 minutes of starting the test. There was no message at all in the hardware log, nor on the console: the system just froze. Changing the CPU fixed the problem.&lt;BR /&gt;&lt;BR /&gt;I'm beginning to think that at least in my case the "many lost ticks" message and the crashes are two separate problems, perhaps completely unrelated to each other.&lt;BR /&gt;&lt;BR /&gt;Based on the similarity with the 32-bit RHES4 case, I've opened a hardware call for my troubled 64-bit RHES4 server. Tomorrow I should have the server up with a new CPU, and then we'll see whether it helps or not...</description>
      <pubDate>Thu, 08 Jun 2006 07:37:10 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651473#M20161</guid>
      <dc:creator>Matti_Kurkela</dc:creator>
      <dc:date>2006-06-08T07:37:10Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651474#M20162</link>
      <description>Another update: replacing the CPU did not help in this case, but replacing the motherboard seems to have fixed the problem. &lt;BR /&gt;&lt;BR /&gt;The server has been working for almost a week now. Our developers ran some stress tests on it (while I was busy on other projects), including running a "cpuburn" utility over a weekend. &lt;BR /&gt;</description>
      <pubDate>Tue, 20 Jun 2006 13:42:57 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651474#M20162</guid>
      <dc:creator>Matti_Kurkela</dc:creator>
      <dc:date>2006-06-20T13:42:57Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651475#M20163</link>
      <description>Hi Boris,&lt;BR /&gt;&lt;BR /&gt;I've checked your attachment and here are some suggestions:&lt;BR /&gt;&lt;BR /&gt;1.co-work with HP service , double check the server's hardware status, especially the processors, memory and System ROM setting.&lt;BR /&gt;&lt;BR /&gt;2. you didn't mentioned that your linux is x86-64. If not, use it instead. RHEL4U2(x86-64) is very stable on our hp servers. it should be the same on yours.&lt;BR /&gt;&lt;BR /&gt;3. don't install anything except the linux OS. No HP PSP, no additional driver and application.&lt;BR /&gt;&lt;BR /&gt;4. stress test the above "clean" server with certain tools such as LTP kit, or you can run stress test by individual tools(processor, IO, memory subsystem, disk system...)&lt;BR /&gt;&lt;BR /&gt;I've handled lots of similar cases during the past two years and use the above steps to quickly isolate the potential issues and troubleshoot the problem. &lt;BR /&gt;&lt;BR /&gt;finally,before you test the OS with those servers, hardware healthy verification is always the first priority task to be done. And most issues are hardware related.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 20 Jun 2006 19:59:40 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651475#M20163</guid>
      <dc:creator>Jun Yu</dc:creator>
      <dc:date>2006-06-20T19:59:40Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651476#M20164</link>
      <description>Did anyone every get a resolution to the random reboots?&lt;BR /&gt;&lt;BR /&gt;0002 Critical       15:03  08/09/2006 15:03  08/09/2006 0001&lt;BR /&gt;LOG: ASR Detected by System ROM&lt;BR /&gt;&lt;BR /&gt;Redhat ES4 / HP320.&lt;BR /&gt;&lt;BR /&gt;Thanks.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 09 Aug 2006 15:40:53 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651476#M20164</guid>
      <dc:creator>Thomas Vertetis</dc:creator>
      <dc:date>2006-08-09T15:40:53Z</dc:date>
    </item>
    <item>
      <title>Re: rhel as 4 update 1 or 2 random crashes</title>
      <link>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651477#M20165</link>
      <description>My case turned out to be faulty hardware. Replacing the CPU did not fix it, but after replacing the motherboard the machine worked reliably again.&lt;BR /&gt;&lt;BR /&gt;Sorry I could not respond earlier: I had a very busy time and then my summer vacation immediately after that.</description>
      <pubDate>Fri, 11 Aug 2006 07:43:53 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-linux/rhel-as-4-update-1-or-2-random-crashes/m-p/3651477#M20165</guid>
      <dc:creator>Matti_Kurkela</dc:creator>
      <dc:date>2006-08-11T07:43:53Z</dc:date>
    </item>
  </channel>
</rss>

