BladeSystem - General
1823066 Members
3093 Online
109645 Solutions
New Discussion юеВ

BL460cG6 Uncorrectable PCI Express Error

 
Keivan
New Member

BL460cG6 Uncorrectable PCI Express Error

Hi

I'm getting:
Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 9, Function 0, Error status 0x00000000)

on multiple BL460cG6 blades, and after that a ASR. This has accured on several Blades several times during normal operation.

Running Windosw 2008 R2 Cluster, Hyper-V.

Only added a NC325m NIC card to each of the blades.

I'm running all latest ILO, BIOS, Driver, Blade Enclose Firmware.

What can be the problem?

/Keivan
22 REPLIES 22
VaughanP
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

We have been getting a similar error
Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 7, Function 0, Error status 0x00000000)
And then 11 minutes later
ASR Detected by System ROM

This has started occuring on BL460cG6 blades after the update to 2010.01 level firmware.
They are running PSP 8.30 and are all Windows 2008 x64 (Not R2).
We have nc382m adapters in Mezz slot 1
Keivan
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

VaughanP have you been in contact with HP?

What are they saying?
Erdogan Temur
HPE Pro

Re: BL460cG6 Uncorrectable PCI Express Error

I think is the replace motherboard.
Kind Regards,
Erdogan.
No support by private messages. Please ask the forum!

Accept or Kudo

Keivan
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

I'm waiting for HP to investigate the problem, it has been escalated to HP Engineering, so I'll just have to wait and see.
VaughanP
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

We have it logged with HP.
They have changed the system board on one of the affected systems and we are waiting to test this.
They have also asked us to change a bios setting for power management to force PCI-E Gen 1 support.
Will see how it goes.
Keivan
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

Hi
I've updated all Firmwares and drivers and I'm still having the exact same problem. VaughanP have you figured out when the problem occurs, for us it seems to happen during intense Network activity like Backup?

Anyone from HP looking in to this problem?

/Keivan
VaughanP
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

We aren't seeing the issue often enough to say this has fixed it but you can try it

1- Reboot Server.
2- Launch RBSU by Pressing F9 when prompted during POST.
3- Go to Power Management Options ├п   Advanced Power Management ├п   PCI Express Generation 2.0 Support.
4- Press Enter and change it from ├в Auto├в to ├в Force PCI-E Generation 1├в .
5- Save and Exit.
6- Monitor the Server.
coblain
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

i'am having the same problem too, after installing win 2003 and install kaspersky plus update antivirus, the server becomes hang an error on the IML says
"Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 0, Function 0, Error status 0x00000000)"

i've called the hp engineer and he says that's it has got to do with the mezzanine card, they told me to shutdown the blade remove it from enclosure unplug the mezzainie card and plug it again.

hope this work.
Riesj
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

We have the exact same problem.

I found this Microsoft article. http://support.microsoft.com/kb/975530

what's the possibility of a relation between this fix an our error?
VaughanP
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

Since setting the PCIE version to Generation 1 as in my previous post we have had no further issues.
samuelfkh
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

Keivan and VaughanP, i have try to configure to PCI -E Gen 1 support in BIOS, but the NC 325m disappear or can't be detected on Win2k8 x64 SP2.

Any other workaround or solution that you have found?
BobbyCabCom
Occasional Contributor

Re: BL460cG6 Uncorrectable PCI Express Error

BL460 G6, installed Win2k8 R2
installed boards:
Emulex LPe1105-HP 4Gb FC HBA for HP c-Class BladeSystem
NC373m Dual Port Multifunction 1Gb NIC for c-Class BladeSystem

IML log:
Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 9, Function 0, Error status 0x00000000)


I have another server setup exactly the same, yet only one server has this problem. PSP 8.6 has been installed.
UPS_Tech
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

I see the same thing on a DL580 G7. HP came in to replace the SPI board and I/O expansion brd, and neither resolved. It is going to either be the system board, or in this case, possibly the processor board. Bus 0, device 9 is identified as one of the PCI to PCI bridges.
Sykes
Occasional Visitor

Re: BL460cG6 Uncorrectable PCI Express Error

Hi,

I find this issue is common with HP Smart Array, Windows OS, Kaspersky Antivirus. Once you install the firmware update relased on 15 Dec 2010, the issue will be solved.

Find all the details here:
http://www.tricksguide.com/blue-screen-error-hardware-malfunction-pci-express-error-hp-proliant-server.html

HP has come up with a firmware update for Smart Array P212, P410, P410i, P411, P712m, and P812 controller (Version: 3.66) to solve this issue. This version is released on 15 Dec 2010.

If you check the Fixes section of firmware download page, you can find that this update resolved an incompatibility between Smart Array and the Kaspersky Anti-Virus tool that resulted in a Smart Array lockup code 0├Г 12. So the final solution is to update the firmware of Smart Array controller.

It will solve the error Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 7, Function 0, Error status 0├Г 00004000), *** Hardware Malfunction
Call your hardware vendor for support
*** The system has halted ***

Cheers mate

:D
UPS_Tech
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

We actually got HP to come in and replace the entire system, excluding the hard drives and QLogic cards. Guess what? It's still happening. Our Support case has been escalated and the only thing they can come up with is modifying the power regulator, and replacing the QLogic cards.
They cannot provide ANY explanation as to why this is occurring.
Peter Kaufmann
Occasional Advisor

Re: BL460cG6 Uncorrectable PCI Express Error

If I read the error correctly the device that is reporting the issue is an embedded PCI-E device and not something plugged into any of the mezzanine slots. We have a BL460C G6 with the same problem. Fortunately it is only one of hundreds and this blade is part of a cluster. This blade is being used in a Microsoft Exchange evironment so it is getting hammered pretty hard. My assumption is that there is a system board problem. We have a case open with HP and are awaiting their solution.
Peter Kaufmann
Occasional Advisor

Re: BL460cG6 Uncorrectable PCI Express Error

After looking at detail within HP Insight Diagnostics the device having or reporting the issue is one of the PCI standard PCI-to-PCI Bridges. I did see that someone had updated the firmware on their P410i embedded array controller, did this by chance resolve the issue?? While HP highly recommends this firmware update their fixes do not mention this error.
pauld482
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

Hi,

 

We are having similar issues with 3 of our HP BL460c G6 blades. They are running Windows Server 2008 R2 Enterprise x64 and configured in a 3 node SQL 2008 cluster. All Blade Firmware, Drivers and BIOS is up to date, as is the c7000 chassis firmware.

 

We've had the following error on all 3 blades at some point in time in the IML:

 

Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 7, Function 0, Error status 0x00000000)

 

An ASR is then triggered afterwards.

 

We did manage to consistantly recreate the problem back in April when using SQLIO to create a 10GB dat file on our remote iSCSI Netapp storage environment when using the iSCSI NC325M NIC.

 

We then implemented the following fix which seemed resolve the issue at the time:

 

Launch RBSU by pressing f9 during Boot. 

Go to Power Management Options

Go to advanced Power Management and PCI Express Generation 2.0 Support

Change Value from Auto to Force PCI-E Generation 

Save and Exit.

 

However 3 weeks ago one of the blades then suffered the following error and then did an ASR:

 

Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 9, Function 0, Error status 0x00000000)

 

So this time the error is on Device 9, not Device 7.

 

We've yet to put full production loads on the cluster and are reluctant to do so until this is actually resolved, since the full production loads are going to be quite high.

 

Has anyone got to the bottom of this issue yet?

CLEB
Valued Contributor

Re: BL460cG6 Uncorrectable PCI Express Error

I've been experiencing Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 9, Function 0, Error status 0x00000020) on a BL460cG6

 

This server has a Qlogic QMH2462 HBA in Mezz1 and an IO Accelerator 320GB in Mezz2.

 

The server has been performing an ASR quite frequently.  This is Windows 2008 R2 with SQL Server 2008.

 

The ASR has been happening when SQL server is doing the database dumps, so the data is streaming off the IO Accelerator through the PCIE bus to the Qlogic QMH2462 and onto our EVA.

 

I've changed the PCIE setting this morning to force Gen1 mode like others have.  I'll report back if I have any findings.

CLEB
Valued Contributor

Re: BL460cG6 Uncorrectable PCI Express Error

Since setting the PCIE to gen1 mode I've had two reboots since. Going to try and put the components in another Blade.

CLEB
Valued Contributor

Re: BL460cG6 Uncorrectable PCI Express Error

Since swapping the blade with another of equal model I have managed to recreate the issue again.

 

It always blue screens when I push a lot of data through the IO Accelerator, I observed it doing lots of SQL dumps at high transfer rates >=550MB/s with low IOPs. Then as soon as it hit high IOPs >= 18K I lost the server and it rebooted.

gmcisco
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

We had the same issue with a 3-node Windows 2008 EE R2 x64 bit failover cluster running SQL Server 2008 EE x64 bit with netapp storage  ( using iSCSI ) 

 

The problem seemed trigger when there was high I/O running over the NIC's assigned to iSCSI traffic ( using MPIO ) and another unknown factor causing the blue screen and ASR. 

 

We tired the various BIOS fixes from HP to no avail.  We even changed system boards etc. 

 

The only way we fixed this issue was by removing all Broadcom NIC'S and replacing with Intel's. 

 

Not had a problem since. The PCI Express bus is Intel and I believe fitting the Intel NICS provided better interoperability.