ProLiant Servers (ML,DL,SL)
1752295 Members
4478 Online
108786 Solutions
New Discussion юеВ

Re: DL360G2 + 2 CPUs + RedHat AS2.1 + random reboots

 
Aaron Bos
New Member

DL360G2 + 2 CPUs + RedHat AS2.1 + random reboots

I have about 60 DL360G2 servers running RedHat AS2.1. Most are single CPU, running 2.4.9-e.9. About 10 are dual CPU, running 2.4.9-e.9enterprise kernel. Those 10 boxes experience an average of a reboot every two days between them (including servers with almost no load on them). I get messages like this on the console:

---start---
casm: NMI Handler has been called on processor 0!
WARNING: casm: NMI - Automatic Server Recovery timer expiration - Hour 23 - 5/28/2003

WARNING: casm: Attempting to shutdown due to ASR timer expiration!

casm: No NMI detected by ROM. Continuing execution . . .
Uhhuh. NMI received for unknown reason 30.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
----end----

Then the server reboots.

Several minutes (maybe five to fifteen) prior to those console messages appearing, the system hangs totally. It can't be pinged, and the console accepts no input.

The single CPU boxes do not have this problem.

The following Compaq-specific RPMs are installed:

cpqhealth-3.1.0-16.Redhat-AS-2-1.i386.rpm
ucd-snmp-4.2.4-1cmaX.6.i386.rpm
cmafdtn-5.50.0-10.i386.rpm
cmasvr-5.50.0-15.i386.rpm
cmanic-5.50.0-3.i386.rpm
cmastor-5.50.0-13.i386.rpm

Has anyone experienced this or something similar, and how did you resolve it?

Thank you.
6 REPLIES 6
ginnie nuckles_1
New Member

Re: DL360G2 + 2 CPUs + RedHat AS2.1 + random reboots

We just had this happen to us .. can you please let us know what this was about ?? thanks
TOMAS BERNABEU
Frequent Advisor

Re: DL360G2 + 2 CPUs + RedHat AS2.1 + random reboots

Hi Aaron !

I have same problem .
How you solved it ?


Aaron Bos
New Member

Re: DL360G2 + 2 CPUs + RedHat AS2.1 + random reboots

The specific problem I was seeing (very frequent ASRs on >1 CPU systems) appeared to have been caused by something broken in the HP health agents (I was running version 5.5 at the time). Once I upgraded to a newer version, the problem went away.
Ken Barlow
New Member

Re: DL360G2 + 2 CPUs + RedHat AS2.1 + random reboots

NMI Handler has been called on processor 0!

I'm getting the same error on a pair of DL380 G3 with 2 xeon processors running 2.4.9-e.48smp and hpasm-7.0.0-21. The only information I get is the following on the console log:

casm: NMI Handler has been called on processor 0!
WARNING: casm: NMI - Automatic Server Recovery timer expiration - Hour 3 - 8/10/2004
WARNING: casm: Attempting to shutdown due to ASR timer expiration!

and a reboot. I have netdump enabled and the same three lines are the only thing the netdump server logs too.

any clues?
TOMAS BERNABEU
Frequent Advisor

Re: DL360G2 + 2 CPUs + RedHat AS2.1 + random reboots


Hi Ken !

1.- DISABLE Hyper-Threading support.

2.- Update BIOS

3.- Update firmware iLO and smart-array

4.- check IRQ compartitions with iLO card.

I am in the point fourth and it seems that it goes OK.
Good luck.
Robert Mela
New Member

Re: DL360G2 + 2 CPUs + RedHat AS2.1 + random reboots

We're seeing the same thing on a DL380 G3. It happened during a web server stress test ( Apache 2, w. PHP and MySQL ). The server also included a module that frequently opened and closed TCP connections to another machine.

The only other noteworthy event is that modprobe was run about 15 to 20 minutes before the lockup.

SAR didn't report any results for 12:00, so I'm guessing the lock-up occurred a few minutes before then.

From messages log:

Sep 28 12:01:35 c10-gs-dev1 kernel: casm: NMI Handler has been called on processor 0!
Sep 28 12:01:35 c10-gs-dev1 kernel: WARNING: casm: NMI - Automatic Server Recovery timer expiration - Hour 19 - 9/28/2004
Sep 28 12:01:35 c10-gs-dev1 kernel:
Sep 28 12:01:35 c10-gs-dev1 kernel: WARNING: casm: Attempting to shutdown due to ASR timer expiration!
Sep 28 12:01:35 c10-gs-dev1 kernel:
Sep 28 12:01:35 c10-gs-dev1 kernel: Uhhuh. NMI received for unknown reason 35 on CPU 0.
Sep 28 12:01:35 c10-gs-dev1 kernel: Dazed and confused, but trying to continue
Sep 28 12:01:35 c10-gs-dev1 kernel: Do you have a strange power saving mode enabled?
Sep 28 12:01:35 c10-gs-dev1 shutdown: shutting down for system reboot
Sep 28 12:01:36 c10-gs-dev1 cevtd[1273]: WARNING: casm: ASR Lockup Detected: (casm device driver alerted)
Sep 28 12:01:36 c10-gs-dev1 init: Switching to runlevel: 6
Sep 28 12:07:50 c10-gs-dev1 syslogd 1.4.1: restart.


uname output:

2.4.20-28.7bigmem #1 SMP Thu Dec 18 11:04:21 EST 2003 i686 unknown

CMA rpms:

ucd-snmp-utils-4.2.4-3cmaX.9
ucd-snmp-devel-4.2.4-3cmaX.9
cmastor-6.20.0-8
ucd-snmp-4.2.4-3cmaX.9
cmanic-6.20.0-5

Last processor in /proc/cpuinfo (I believe it's two physical processors, 4 with hyper-threading )

processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
stepping : 7
cpu MHz : 2384.361
cache size : 512 KB
physical id : 3
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 4771.02