ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Unrecoverable System Error (NMI) has occurred

Arne Defurne
Occasional Visitor

Unrecoverable System Error (NMI) has occurred

Hello,

On a ProLiant DL580 G3 (running VMWare ESX 3.5) we recieved an Unrecoverable System Error (NMI).

Just a bit info before this

Last week I connected a networkcable to the server that caused the server to reboot. After searching trough different logs I found hplogs that stated a PCI bus error:

Type hplog -v to get a listing of ASRs

0016 Critical 14:29 03/11/2009 14:29 03/11/2009 0001
LOG: PCI Bus Error (Slot 0, Bus 22, Device 2, Function 0)

0017 Caution 13:30 03/11/2009 13:30 03/11/2009 0001
LOG: POST Error: A Critical Error occurred prior to this power-up

By checking the PCI list I found that it was the network interface that caused the problem:

Take in mind that Port 22 decimal = port 16 HEX

Type lspci –t for a list of all pci devices

[root@vmw8 root]# lspci -tv

+-02.0-[12-19]--+-00.0-[13-15]--+-01.0 Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)
| | \-01.1 Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)
>>> | \-00.2-[16-19]--+-02.0 Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)
| \-02.1 Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)

I tried to replicate this today but I got an Unrecoverable System Error (NMI).

on screen I recieved:
11:17:46:53:624 CPU0:1025)APIC:1382:LINT1 interrupt on PCPU0 (port x61 contains 0x81)

HPLOG -v

0018 Critical 08:19 03/23/2009 08:19 03/23/2009 0001
LOG: An Unrecoverable System Error (NMI) has occurred

0019 Caution 07:45 03/23/2009 07:45 03/23/2009 0001
LOG: POST Error: A Critical Error occurred prior to this power-up

/var/log/messages

Mar 23 08:19:31 vmw8 kernel: NMI received. Trying to continue.
Mar 23 08:19:31 vmw8 kernel: You probably have a hardware problem with your RAM chips.
Mar 23 08:19:31 vmw8 kernel: Please consult hardware error logs.
Mar 23 08:19:31 vmw8 hpasmd[2778]: CRITICAL: hpasmd: An Unrecoverable System Error (NMI) has occurred
Mar 23 08:19:31 vmw8 shutdown: shutting down for system reboot
Mar 23 08:19:32 vmw8 init: Switching to runlevel: 6

but the server hung while rebooting.

Does anyone have an idea?

thx a lot
Arne
2 REPLIES
Paul Kavarana
Occasional Visitor

Re: Unrecoverable System Error (NMI) has occurred

I have a customer with this problem to have you had any result? If I find a fix I'll let you know.

Regards,

Paul
Arne Defurne
Occasional Visitor

Re: Unrecoverable System Error (NMI) has occurred

I have tried to replicate this but I have not encountered this again.
Since then everything runs without any problems.