ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Proliant dl360p gen8An Unrecoverable SystemError (NMI)has occurred (iLO application watchdog timeout

 
SOLVED
Go to solution
Florant
Occasional Advisor

Proliant dl360p gen8An Unrecoverable SystemError (NMI)has occurred (iLO application watchdog timeout

An Unrecoverable System Error (NMI) has occurred (iLO application watchdog timeout NMI, Service Information: 0x0000002B, 0x00000000)

Hi All , 

We have one DL380p gen 8 server runnign Red Hat Enterprise Linux Server Release 6.2, it is second time that server crashed and reboot with the following IML logs :

ASR    04/24/2019 22:00    04/24/2019 22:00    1    ASR Detected by System ROM

System Error    04/24/2019 21:58    04/24/2019 21:58    1    An Unrecoverable System Error (NMI) has occurred (iLO application watchdog timeout NMI, Service Information: 0x0000002B, 0x00000000)

sys logs :
========================================

Apr 24 22:01:54 prs3-tir hp-ams[2834]: hpHelper Started . .
Apr 24 22:02:08 prs3-tir hpasmlited[2895]: hpDeferSPDThread: Starting thread to collect DIMM SPD Data.
Apr 24 22:02:08 prs3-tir hpasmlited[2895]: Initialize data structures succesful
Apr 24 22:02:13 prs3-tir hp-ams[2834]: CRITICAL: An Unrecoverable System Error (NMI) has occurred (iLO application watchdog timeout NMI, Service Information: 0x0000002B, 0x00000000)
Apr 24 22:02:14 prs3-tir hp-ams[2834]: CRITICAL: ASR Detected by System ROM
Apr 24 22:02:15 prs3-tir hpasrd[2922]: Starting with poll 1 and timeout 600
Apr 24 22:02:15 prs3-tir hpasrd[2922]: Setting the watchdog timer.
Apr 24 22:02:15 prs3-tir hpasrd[2922]: Found iLO memory at 0xf7df0000.
Apr 24 22:02:15 prs3-tir hpasrd[2922]: Successfully mapped device.
Apr 24 22:02:15 prs3-tir cmanicd: Entering iml_log_link_up(slot: 0, port: 1)
Apr 24 22:02:15 prs3-tir cmanicd: Entering get_event_id(slot: 0, port: 1
Apr 24 22:02:47 prs3-tir hpasmlited[2895]: hpDeferSPDThread: End of Collecting DIMM SPD data.
Apr 24 22:02:48 prs3-tir cmanicd: Existing event id(4) found for the slot and port.
Apr 24 22:02:48 prs3-tir cmanicd: Entering repair_iml_event(slot: 0, port: 1, event: 4)
Apr 24 22:02:48 prs3-tir cmanicd: Entering read_iml_event(slot: 0, port: 1, eventid: 4)
Apr 24 22:02:48 prs3-tir cmanicd: Calling ioctl() to read event id: 4)
Apr 24 22:02:48 prs3-tir cmanicd: Successfully read the event id: 4)
Apr 24 22:02:48 prs3-tir cmanicd: Trying to repair the existing IML Event.
Apr 24 22:02:48 prs3-tir cmanicd: Successfully repaired the IML Event.
Apr 24 22:02:48 prs3-tir cmanicd: Returning from repair_iml_event().

=======================================================

 

Any one has faced this issue and know how to resolve?

 

Thanks in Advacnce 

Florant

1 REPLY 1
sangam_s
HPE Pro
Solution

Re: Proliant dl360p gen8An Unrecoverable SystemError (NMI)has occurred (iLO application watchdog tim

Hi Florant,

 

This NMI seems to be know issue with RHEL 

NMI An Unrecoverable System Error (NMI) has occurred (iLO application watchdog timeout NMI, Service Information: 0x0000002B, 0x00000000)

please see the below adviosry from REDHAT

https://access.redhat.com/solutions/1309033

 

IML log has the following entry:


An Unrecoverable System Error (NMI) has occurred (System error code 0x0000002B, 0x00000000)
Resolution

By default systemd starts a watchdog timer on shutdown. Disable ShutdownWatchdogSec to resolve this issue. To disable it, please open /etc/systemd/system.conf file and find following line:


#ShutdownWatchdogSec=10min
Change them to:


ShutdownWatchdogSec=0
Save the file and after that run:


# systemctl daemon-reexec
to allow systemd to know about the updated configuration or reboot the system.

NOTE: You may also wish to look at RuntimeWatchdogSec in the same file, it is disabled by default, please do not enable -it without specific reasons for doing so.

--------------------------------------------------------------------------------------------------------------------------------------
If still issue persist we recommand log a case with REDHAT. 

If you need futher troubleshooting from Hardware side kindly log a case with HPE and share all the logs (AHS and SOS report)

Regards,

Sangam.

I am an HPE employee

Accept or Kudo