BladeSystem - General
1753479 Members
4794 Online
108794 Solutions
New Discussion юеВ

Re: BL460cG6 Uncorrectable PCI Express Error

 
VaughanP
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

Since setting the PCIE version to Generation 1 as in my previous post we have had no further issues.
samuelfkh
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

Keivan and VaughanP, i have try to configure to PCI -E Gen 1 support in BIOS, but the NC 325m disappear or can't be detected on Win2k8 x64 SP2.

Any other workaround or solution that you have found?
BobbyCabCom
Occasional Contributor

Re: BL460cG6 Uncorrectable PCI Express Error

BL460 G6, installed Win2k8 R2
installed boards:
Emulex LPe1105-HP 4Gb FC HBA for HP c-Class BladeSystem
NC373m Dual Port Multifunction 1Gb NIC for c-Class BladeSystem

IML log:
Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 9, Function 0, Error status 0x00000000)


I have another server setup exactly the same, yet only one server has this problem. PSP 8.6 has been installed.
UPS_Tech
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

I see the same thing on a DL580 G7. HP came in to replace the SPI board and I/O expansion brd, and neither resolved. It is going to either be the system board, or in this case, possibly the processor board. Bus 0, device 9 is identified as one of the PCI to PCI bridges.
Sykes
Occasional Visitor

Re: BL460cG6 Uncorrectable PCI Express Error

Hi,

I find this issue is common with HP Smart Array, Windows OS, Kaspersky Antivirus. Once you install the firmware update relased on 15 Dec 2010, the issue will be solved.

Find all the details here:
http://www.tricksguide.com/blue-screen-error-hardware-malfunction-pci-express-error-hp-proliant-server.html

HP has come up with a firmware update for Smart Array P212, P410, P410i, P411, P712m, and P812 controller (Version: 3.66) to solve this issue. This version is released on 15 Dec 2010.

If you check the Fixes section of firmware download page, you can find that this update resolved an incompatibility between Smart Array and the Kaspersky Anti-Virus tool that resulted in a Smart Array lockup code 0├Г 12. So the final solution is to update the firmware of Smart Array controller.

It will solve the error Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 7, Function 0, Error status 0├Г 00004000), *** Hardware Malfunction
Call your hardware vendor for support
*** The system has halted ***

Cheers mate

:D
UPS_Tech
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

We actually got HP to come in and replace the entire system, excluding the hard drives and QLogic cards. Guess what? It's still happening. Our Support case has been escalated and the only thing they can come up with is modifying the power regulator, and replacing the QLogic cards.
They cannot provide ANY explanation as to why this is occurring.
Peter Kaufmann
Occasional Advisor

Re: BL460cG6 Uncorrectable PCI Express Error

If I read the error correctly the device that is reporting the issue is an embedded PCI-E device and not something plugged into any of the mezzanine slots. We have a BL460C G6 with the same problem. Fortunately it is only one of hundreds and this blade is part of a cluster. This blade is being used in a Microsoft Exchange evironment so it is getting hammered pretty hard. My assumption is that there is a system board problem. We have a case open with HP and are awaiting their solution.
Peter Kaufmann
Occasional Advisor

Re: BL460cG6 Uncorrectable PCI Express Error

After looking at detail within HP Insight Diagnostics the device having or reporting the issue is one of the PCI standard PCI-to-PCI Bridges. I did see that someone had updated the firmware on their P410i embedded array controller, did this by chance resolve the issue?? While HP highly recommends this firmware update their fixes do not mention this error.
pauld482
New Member

Re: BL460cG6 Uncorrectable PCI Express Error

Hi,

 

We are having similar issues with 3 of our HP BL460c G6 blades. They are running Windows Server 2008 R2 Enterprise x64 and configured in a 3 node SQL 2008 cluster. All Blade Firmware, Drivers and BIOS is up to date, as is the c7000 chassis firmware.

 

We've had the following error on all 3 blades at some point in time in the IML:

 

Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 7, Function 0, Error status 0x00000000)

 

An ASR is then triggered afterwards.

 

We did manage to consistantly recreate the problem back in April when using SQLIO to create a 10GB dat file on our remote iSCSI Netapp storage environment when using the iSCSI NC325M NIC.

 

We then implemented the following fix which seemed resolve the issue at the time:

 

Launch RBSU by pressing f9 during Boot. 

Go to Power Management Options

Go to advanced Power Management and PCI Express Generation 2.0 Support

Change Value from Auto to Force PCI-E Generation 

Save and Exit.

 

However 3 weeks ago one of the blades then suffered the following error and then did an ASR:

 

Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 9, Function 0, Error status 0x00000000)

 

So this time the error is on Device 9, not Device 7.

 

We've yet to put full production loads on the cluster and are reluctant to do so until this is actually resolved, since the full production loads are going to be quite high.

 

Has anyone got to the bottom of this issue yet?

CLEB
Valued Contributor

Re: BL460cG6 Uncorrectable PCI Express Error

I've been experiencing Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 9, Function 0, Error status 0x00000020) on a BL460cG6

 

This server has a Qlogic QMH2462 HBA in Mezz1 and an IO Accelerator 320GB in Mezz2.

 

The server has been performing an ASR quite frequently.  This is Windows 2008 R2 with SQL Server 2008.

 

The ASR has been happening when SQL server is doing the database dumps, so the data is streaming off the IO Accelerator through the PCIE bus to the Qlogic QMH2462 and onto our EVA.

 

I've changed the PCIE setting this morning to force Gen1 mode like others have.  I'll report back if I have any findings.