ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

'proliant dl360p gen8' Unrecoverable System Error - An Unrecoverable System Error (NMI) has occurred

ambermehra
Occasional Contributor

'proliant dl360p gen8' Unrecoverable System Error - An Unrecoverable System Error (NMI) has occurred

Hi all,

 

we have two dl360 g8 box and running 64 bit OS RHEL5 and oracle rac setup, Tonight suddenly one of node crashed 

and IML logs shows following error when server rebooted, My system is new and all firmwares and drivers are up to date.

 

 

Event: 20 Added: 12/25/2013 19:13
CRITICAL: Unrecoverable System Error - An Unrecoverable System Error (NMI) has occurred (System error code 0x0000002B, 0x00000000).

 

Event: 21 Added: 12/25/2013 19:14
CRITICAL: ASR - ASR Detected by System ROM.

 

Please let me know if you face such error and how did u fix this?

 

Thanks

Amber 

3 REPLIES
Johan Guldmyr
Honored Contributor

Re: 'proliant dl360p gen8' Unrecoverable System Error - An Unrecoverable System Error (NMI) has occu

Hi,

Check syslog/messages for the OS logs.

I've seen ASR happen when the server runs out of memory.
lmijmcy
Occasional Visitor

Re: 'proliant dl360p gen8' Unrecoverable System Error - An Unrecoverable System Error (NMI) has occu

Hi Amber

 

Is it running SUSE Linux?

 

BR

James

 

 

 

https://www.suse.com/company/press/2012/2/suse-linux-enterprise-11-service-pack-2-released.html

 

Old HP case:

  

We saw same issue which seemed to be linked to the ILO and its drivers with respect to NMI on SLES 11.

 

Linux 2.6.32.45-0.3-default

SUSE Linux Enterprise Server 11 (x86_64)

VERSION = 11

PATCHLEVEL = 1

 

An Unrecoverable System Error (NMI) has occurred (System error code 0x0000002B, 0x00000000)

This refers to the (0x0000002B) ILO Watchdog NMI.

 

Following counts for interrupts in logs can be co-related:

System peripheral: Hewlett-Packard Company iLO3 Management Processor Support and Messaging (rev 05)

        Subsystem: Hewlett-Packard Company Device 330e

        Flags: bus master, fast devsel, latency 0, IRQ 17

        I/O ports at 2800 [size=256]

        Memory at fa7f0000 (32-bit, non-prefetchable) [size=256]

        Memory at fa600000 (32-bit, non-prefetchable) [size=1M]

        Memory at fa580000 (32-bit, non-prefetchable) [size=512K]

        Memory at fa570000 (32-bit, non-prefetchable) [size=32K]

        Memory at fa560000 (32-bit, non-prefetchable) [size=32K]

        [virtual] Expansion ROM at fa500000 [disabled] [size=64K]

        Capabilities: [78] Power Management version 3

        Capabilities: [b0] Message Signalled Interrupts: Mask- 64bit+ Count=1/1 Enable-

        Capabilities: [c0] Express Legacy Endpoint, MSI 00

        Kernel driver in use: hpilo

        Kernel modules: hpilo

 

<6>[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])

kernel.nmi_watchdog = 0

kernel.panic = 0

kernel.panic_on_io_nmi = 0

 

IRQ:   17:         30          0          0          0          0          0          2          0          0          0          0          0          0          0          0          0  IR-IO-APIC-fasteoi   uhci_hcd:usb6, hpilo

 

<6>[   68.090998] power_meter ACPI000D:00: Found ACPI power meter.

  <6>[   68.091660] hpilo 0000:01:00.2: PCI INT B -> GSI 17 (level, low) -> IRQ 17

  <7>[   68.091667] hpilo 0000:01:00.2: setting latency timer to 64

  <6>[   68.092366] input: PC Speaker as /devices/platform/pcspkr/input/input3

  <3>[   68.148052] power_meter ACPI000D:00: Ignoring unsafe software power cap!

 

  SubDevice: pci 0x3245 "Smart Array P410i"

  Revision: 0x01

  Driver: "cciss"

  Driver Modules: "cciss"

  Driver Info #0:

    Driver Status: cciss is active

    Driver Activation Cmd: "modprobe cciss"

  Driver Info #1:

    Driver Status: hpsa is active

    Driver Activation Cmd: "modprobe hpsa"

 

           ahci: module = ahci

           cciss: /devices/pci0000:00/0000:00:1c.0/0000:0c:00.0

           cciss: module = cciss

            hpsa: module = hpsa

 

The following is the recommendation, citing the above in the logs, as per the update received:

SLES 11 Kernels prior to the 3.0.13-0.27 kernel (that comes with SLES 11 SP2) are not able to reliably respond to an iLO-triggerd NMI due to known issues with the kernel and the native smart array controller driver. It is recommended that servers that require the ability to respond to an iLO-triggered NMI be updated to SLES 11 SP2 or later.

 

https://www.suse.com/company/press/2012/2/suse-linux-enterprise-11-service-pack-2-released.html

or

http://www.novell.com/support/kb/doc.php?id=7012368

 

Additional resource on related information can be obtained from:

Release Notes for SUSE Linux Enterprise Server 11

http://www.novell.com/linux/releasenotes/i386/SUSE-SLES/11/

amber-mehra
Occasional Visitor

Re: 'proliant dl360p gen8' Unrecoverable System Error - An Unrecoverable System Error (NMI) has occu

Sorry for late response, It's Red Hat Enterprise Linux Server release 5.6