ProLiant Servers (ML,DL,SL)
1747997 Members
4601 Online
108756 Solutions
New Discussion

Re: DL360p gen8 STOP: 0x00000080 Uncorrectable PCI Express Error, NMI Hardware Failure

 
ny942631
Occasional Collector

DL360p gen8 STOP: 0x00000080 Uncorrectable PCI Express Error, NMI Hardware Failure

Hi All,

We have 2 DL360p Gen8 servers, both have 2xCPU E5-2630L and 32GB RAM (8x4GB Genuine HP memory), SA420i with 1GB cache.

Installed Windows 2016, updated FW to the latest, installed drivers from the latest SPP

Both servers crash with:

Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 28, Function 7, Error status 0x00100000)

STOP: 0x00000080 (0x00000000004F4454, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000))

I read some troubleshooting steps already and did the following:

Set the memory refresh to Auto
Set Power Mode to static high peformance/disable C6
Removed PCI Raiser Board
Removed SA cache module

NOTHING helps.

Server 1 was configured as AD everything was going fine and all of a sudden after another reboot it refuses to come up.  It just doesn't want to boot to OS, period. Crashes every time Windows starts loading.

Server 2 just crashed upon restart but is booting up on the second try. I'm afraid it will start crashing all the time as well.

Attached is the screenshoot with IML.

Please help, I'm going insane...

10 REPLIES 10
Erdogan Temur
HPE Pro

Re: DL360p gen8 STOP: 0x00000080 Uncorrectable PCI Express Error, NMI Hardware Failure

Hi,

Although everything seems fine but if the problem, persist, i think non-hpe part is attached on the server. non-hpe ssd or sd card etc.

Kind Regards,
Erdogan.
No support by private messages. Please ask the forum!

Accept or Kudo

ny942631
Occasional Collector

Re: DL360p gen8 STOP: 0x00000080 Uncorrectable PCI Express Error, NMI Hardware Failure

Hi. No. There are only 2 HP branded 300gb SAS HDD are attached. They are mechanical ones not SSD. No SD cards are attached either. The server just refuses to go to the OS, period. It was working fine before.

What a mistery....
Erdogan Temur
HPE Pro

Re: DL360p gen8 STOP: 0x00000080 Uncorrectable PCI Express Error, NMI Hardware Failure

Hi,

Call the HPE support center.

AHS report needs to be check.
The following advisory may be appropriate for DL360p Gen8.

https://support.hpe.com/hpsc/doc/public/display?docId=mmr_kc-0134984&sp4ts.oid=254623&lang=de-ch&cc=ch

Kind Regards,
Erdogan.
No support by private messages. Please ask the forum!

Accept or Kudo

EDVDVD
Visitor

Re: DL360p gen8 STOP: 0x00000080 Uncorrectable PCI Express Error, NMI Hardware Failure

Hello,

It is the Matrox driver from the latest Proliant Support Packs that is causing this BSOD, if you install the drivers before juli 2017 the BSOD's go away.

You can test it by installing the newest driver when te server is active, It couses an instant-BSOD.

We have the driver from 31-1-2017 running at the moment for 2 months its stable, newer driver creates random crash.

GreeTz
Visitor

Re: DL360p gen8 STOP: 0x00000080 Uncorrectable PCI Express Error, NMI Hardware Failure

I have the same problem here and discovered the BSOD occurs at the moment I connect to the machine via Integrated Remote Console. When Integrated Remote Console stays connected the machine stays in a reboot loop.

Jay Eagle
Visitor

Re: DL360p gen8 STOP: 0x00000080 Uncorrectable PCI Express Error, NMI Hardware Failure

@EDVDVD I read your post and rolled back to a 2016 release of the Matrox driver and that worked.  My history with this problem ...

My server was a DL360p Gen 8 with a fresh install of Windows Server 2016 and all of the updates.  I was able to reacreate the NMI Hardware Failure error and BSOD on demand (including the OP's details of "Uncorrectable PCI Express Error, Embedded device, Bus 0, Device 28, Function 7, Error status 0x00100000") by just starting Insight Diagnositcs Online Edition -- it would BSOD with the NMI Hardware Failure during the survey (i.e., practically right away) about 1 out of 3 times.

I then began my Internet search and came across your suggestion.

I found (at this time) 3 versions of the Matrox G200eh display adapter drivers for G8's + Windows Server 2016 in the HPE Support Center:

#1.  9.15.1.184 (12 Jul 2017) - The one I had that came down with the PSP for Gen 8 servers + Windows Server 2016.  This is the one that was causing my NMI Hardware Failure and BSOD.

#2.  9.15.1.174 (12 Jul 2017) - Not sure how this one hit HP's repository, but I never installed it.  I just saw it while searching for older drivers.

#3.  9.15.1.143 (24 Oct 2016) - This is the one I rolled back to (installed) because it was prior to your recommendation of being before July 2017.

Since installing 9.15.1.143 (24 Oct 2016), I have not been able to recreate the BSOD.  I've only tested for an hour now, but before I could recreate it in between 0 and 120 seconds.

My search string in the HPE Support Center was "Matrox G200eh Windows Server 2016 Gen 8", and that resulted in me finding the following page with the above drivers:

Matrox G200eH Video Controller Driver for Windows Server 2016

Thank you for the info!

Gopinath_R
New Member

Re: DL360p gen8 STOP: 0x00000080 Uncorrectable PCI Express Error, NMI Hardware Failure

HI,

do you have any resolution .

JS7
New Member

Re: DL360p gen8 STOP: 0x00000080 Uncorrectable PCI Express Error, NMI Hardware Failure

Solution:
- Boot into Safe Mode
- Open Device Manager > Display Adapters > Matrox G200eh (HP) WDDM 2.0 > Properties > Driver > Roll Back Driver
- It should revert to previous version: 4.3.1.5 (Date: 7/12/2016)
- Reboot into Default Mode
 
IMPORTANT
HPSUM (7.6.0 + SPP_2017.04.0) is the last production SPP to contain components for the G7 and Gen8 server platforms.
HPSUM (8.0.0 + SPP_2017.07.0) is the new production SPP to contain components for the Gen9 and Gen10 server platforms.
Letze01
Occasional Visitor

Re: DL360p gen8 STOP: 0x00000080 Uncorrectable PCI Express Error, NMI Hardware Failure

Hi,

I had the same problem multiple times, but only on 1 system. For me it helped to locate and rename the file cp034074.exe, cause it seemed, the NMI error occured during self inventory of this update,

Regards