ProLiant Servers (ML,DL,SL)
1822210 Members
3837 Online
109641 Solutions
New Discussion

Re: Uncorrectable PCI Express Error Detected. Slot 255 (Segment 0x0, Bus 0xA4, ...)

 
kzhang_pamp
Established Member

Uncorrectable PCI Express Error Detected. Slot 255 (Segment 0x0, Bus 0xA4, ...)

Hey,
We have a Prolaint Apollo XL675D Gen10 server with BIOS A47. Recently we ran into this issue with the PCI Bus:

-Unrecoverable I/O Error has occurred. System Firmware will log additional details in a separate IML message entry if possible.

-Uncorrectable PCI Express Error Detected. Slot 255 (Segment 0x0, Bus 0x84, Device 0x1, Function 0x0). Uncorrectable Error Status: 0x100000

 

How can we solve this issue? Thanks!


Updates. We tried updating the BIOS from v2.56 (02/10/2022) to v2.68 (02/06/2023) (found here https://support.hpe.com/connect/s/softwaredetails?softwareId=MTX_d43155b2420946b58ca01ffedd).

The problem persists

7 REPLIES 7
kzhang_pamp
Established Member

Re: Uncorrectable PCI Express Error Detected. Slot 255 (Segment 0x0, Bus 0xA4, ...)

Update the system BIOS from v2.56 (02/10/2022) to v2.68 (02/06/2023) (found here https://support.hpe.com/connect/s/softwaredetails?softwareId=MTX_d43155b2420946b58ca01ffedd).

 

The problem persists

TVVJ
HPE Pro

Re: Uncorrectable PCI Express Error Detected. Slot 255 (Segment 0x0, Bus 0xA4, ...)

Hello,

You may start by updating the server's BIOS. You may either of the below links based on the operating system:

Online ROM Flash Component:

Windows x64
Linux

Regards,



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[All opinions expressed here are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
kzhang_pamp
Established Member

Re: Uncorrectable PCI Express Error Detected. Slot 255 (Segment 0x0, Bus 0xA4, ...)

UPDATES

We installed the Linux version system BIOS(Online ROM Flash Component) for our server running Ubuntu 22.04. The situation is the same.

BPSingh
HPE Pro

Re: Uncorrectable PCI Express Error Detected. Slot 255 (Segment 0x0, Bus 0xA4, ...)

 

Greetings!

It has to be decoded to which device Bus 0x84, Device 0x1, Function 0x0 is pointing to. 
This could point to a peripheral card like GPU, Network card, etc..  

Please log a support ticket and provide the AHS logs to get this issue investigated. 



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
kzhang_pamp
Established Member

Re: Uncorrectable PCI Express Error Detected. Slot 255 (Segment 0x0, Bus 0xA4, ...)

Hey @BPSingh 

Thanks for the suggestions, I think you are correct, that location is indeed pointing to the Nvidia A100 GPUS.

I opened a support ticket, however, I cannot even receive the AHS logs as when I hit F10, I get the following "screen of death"!

photo1.PNGphoto2.PNGphoto3.PNG

 

BPSingh
HPE Pro

Re: Uncorrectable PCI Express Error Detected. Slot 255 (Segment 0x0, Bus 0xA4, ...)

If you have iLO configured on the server, you should be able to download the Active Health System log from there. 

https://techlibrary.hpe.com/docs/iss/DL360pGen8/tsg/advanced/Content/151471.htm#:~:text=To%20download%20the%20Active%20Health%20System%20log%20using%20HP%20iLO,for%20the%20last%20seven%20days.



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
kzhang_pamp
Established Member

Re: Uncorrectable PCI Express Error Detected. Slot 255 (Segment 0x0, Bus 0xA4, ...)

Hey @BPSingh ,

Thanks for the tips, I was able to successfully upload the AHS in the support ticket.