ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

DL380p Gen8 with uncorrectabl PCI express error

Hank9999
Occasional Visitor

DL380p Gen8 with uncorrectabl PCI express error

HI everyone,

there is a problem with my DL380p gen8. The server keeps crashing with always the same error messages

Critical PCI Bus 03/13/2013 17:12 03/13/2013 17:12 1 Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 2, Function 2, Error status 0x00000000)
Critical System Error 03/13/2013 17:12 03/13/2013 17:12 1 Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible
Caution POST Message 03/13/2013 16:43 03/13/2013 16:43 1 POST Error: 1792-Slot X Drive Array - Valid Data Found in Cache Module. Data will automatically be written to drive array.
Caution POST Message 03/13/2013 16:43 03/13/2013 16:43 1 POST Error: 1719 - A controller failure event occurred prior to this power-up
Critical PCI Bus 03/13/2013 16:41 03/13/2013 16:41 1 Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 2, Function 2, Error status 0x00000000)
Critical System Error 03/13/2013 16:41 03/13/2013 16:41 1 Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible

 

 

There is no PCI card installed, external USB drives are already detached, the latest Service Pack and Intelligent Provisioning was installed and updated.


 

Could this be caused by the PCI riser cage, although nothing is installed?

12 REPLIES
Bjoern13
Advisor

Re: DL380p Gen8 with uncorrectabl PCI express error

Hello Hank,

if you have updated the complete server and everything is to the latest version, you can try to run the server without the riser cage, if nothing is installed and it is not needed anyways right now.

Concerning the external USB drives: This is good that you tried it without it/them, since this is often the cause for unexpected behaviour.
If you can test it without the PCI riser. In either case (solved or not afterwards) I would advise you to contact the support so that the part(s) can be replaced afterwards.

If you do so, make sure you provide an AHS log (downloadable from iLO). This will help the support colleagues and figure out what went wrong. They can check the PCI Bus and which device/ system component is using the Bus 0, Device 2

-----------------
I am an HP employee.

Was this post useful? - You may click the KUDOS! star to say thank you.
Med-H
Frequent Advisor

Re: DL380p Gen8 with uncorrectabl PCI express error

Hi,

 

what is the frequency of the issue ?

 

Please try to eliminate the NIC card installed in the server "HP Ethernet Adapter"

 

Services Media Library:

 

http://h20464.www2.hp.com/results.htm?SID=5177957&MEID=CF42A2E7-1AF3-49B8-8CB9-76533C7441F6

 

I am an HP employee
krishblr
Occasional Advisor

Re: DL380p Gen8 with uncorrectabl PCI express error

Update System ROM dated 12/21/2011: has the fixes for Unexpected Uncorrectable PCI Express Error.

 

Ref the advisory.
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c03294444&lang=en&cc=us&taskId=101&prodSeriesId=5177957

 

If you still see the following error, Please raise a support case with HP along AHS logs for further technical assistance.

 

Caution POST Message 03/13/2013 16:43 03/13/2013 16:43 1 POST Error: 1792-Slot X Drive Array - Valid Data Found in Cache Module. Data will automatically be written to drive array.
Caution POST Message 03/13/2013 16:43 03/13/2013 16:43 1 POST Error: 1719 - A controller failure event occurred prior to this power-up

Cache module could have an issue.

---------------
I am an HP employee.

If you wish to thank, Just click the KUDOS Star.

Hank9999
Occasional Visitor

Re: DL380p Gen8 with uncorrectabl PCI express error

Hello again,
thanks for the fast reply.

 

The frequency of the issue was first every few days, mostly during the weekend. This is why I suspected the USB drives.
We already had a systemboard replacement and after that the freqency went up. It was almost every 15 minutes then that the server crashed.

 

This is why I asked for the PCI riser cage. I have taken it out now, and up to now, there was no further issue.
I have to monitor it for a while to see if that actually stopped it. There is no additional network card installed. Just the build in one

madhuiss
HPE Pro

Re: DL380p Gen8 with uncorrectabl PCI express error

The Offline Advanced Survey will have more details about the device & the bus.AHS will also be ok to know the device.

 

Embedded device, Bus 0, Device 2, Function 2, Error status 0x00000000

 

There are 2 things could be workedaround:

 

- Disable Intel QPI in the RBSU

- Disable all C-states( No c-states) for verification.

 

Thank You

 

I am an HP Employee

Hank9999
Occasional Visitor

Re: DL380p Gen8 with uncorrectabl PCI express error

now there was again an unexpected restart although the PCI riser is out.

so the frequency slowed down but it was not solved.
since the Systemboard was already replaced and the Smart Array Controller is embedded, could it be the cache module that causes the issue?
Med-H
Frequent Advisor

Re: DL380p Gen8 with uncorrectabl PCI express error

could you please provide us the ahs logs.

 

How to Generate Active Health System log via iLO GUI :

 

  1. Logon to HP iLO 4 GUI (IP address at POST visible or check server iLO Default Network Settings tag for DNS name).

  2. Go to the menu on the left called: Active Health System Log .

  3. Verify the date interval (default is 7 days) and make sure that all possible failures are covered within this date range as selected.

  4. Enter the possible Contact information so that the HP Support agent can easily get back to user with follow up questions.

    The Contact information text is the only readable text in this binary AHS log file.

  5. Press the Download button to start downloading the AHS log file.

     

     

    thanks

I am an HP employee
Allgäu
Occasional Visitor

Re: DL380p Gen8 with uncorrectabl PCI express error

Hello,

 

we have the same issue with 14 Gen8 servers. Is there a solution for this problem.

 

Thanks

Chris

Sbrown
Valued Contributor

Re: DL380p Gen8 with uncorrectabl PCI express error

1. firmware is outdated.

2. P420 is overheating.

3. Bad motherboard. We found it would only show up with ESXi - that would show tons of ECC correctable errors. HP memtestd nothing.

We got a new MOBO and all was good! I think the first batch were buggy lol!

(install ESXI using hp oem installer) , warm it up with some benchmarks, and ssh into the host and

dmesg

or look at the logs. It was a mess!
aperson
Occasional Visitor

Re: DL380p Gen8 with uncorrectabl PCI express error

We have the same issue with ESXi 5.0 U3. We still cannot resolve the issue and it occurs with DL380p Gen 8 8-core and 12-core models. 


The HBA on the riser card fails. Only this HBA 81Q: 

QLogic PCI to Fibre Channel Host Adapter for HPAK344A:

 

Host Device Name vmhba3

 

BIOS version 3.13

FCODE version N/A

EFI version 6.23

Flash FW version 5.09.00

 

Is there any resolution? We have updated drivers and FW of system board, replaced system board and riser board, yet we still get the same failures after some days. 

PMI_WINCHAM
Occasional Visitor

Re: DL380p Gen8 with uncorrectabl PCI express error

Same issue here on a DL360p Gen 8 running Xenserver 6.1.

 

Happens on average once a month.

 

System ROM - P71 02/25/2012

 

It is very frustrating to not to see much effort on HP's behalf to sort this issue.

DaneTruscott
Occasional Visitor

Re: DL380p Gen8 with uncorrectabl PCI express error

We have 4 DL380p and we are getting the same issue on only one of them,  as the last post was asbout a year ago has anyone had a fix for this?