ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

G8 DL360E NMI issue

 
GioP
Occasional Advisor

G8 DL360E NMI issue

Hi,

I am having issues with a gen8 DL360e throwing errors and not booting.

The error code is :

Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 1, Function 1, Error status 0x00000020)

If I run the machine without the PCI riser it will work fine. I have an identical machine so I swapped the PCI riser and the same error occurs (and both risers work fine in the machine which works).

At one stage after pulling the riser in and out heaps of times, I was able to get the server to start with it but the B320i on board the riser was not being detected.

I believe this all happened after i was updating the system, I first updated ILO, then the BIOS and then use the service pack to update the rest. 

I am not sure if there is a connection, but the last event before this error started occuring is that the Power Management controller was updated to version 3.3.

I have tried just about everything now, including flashing everything all over again.

Can someone point me in the right direction?

Thanks in advance

7 REPLIES 7
Manojvk
Occasional Visitor

Re: G8 DL360E NMI issue

Hi 

 

With the provided information , it is difficult to decode the errors . i would suggets you to raise a support ticket with HPE technocal team and provide all the necessary logs .

 

FROM HPE TECHNICAL TEAM 

 

 

Suman_1978
HPE Pro

Re: G8 DL360E NMI issue

Hi,

See if this article helps you https://support.hpe.com/hpsc/doc/public/display?docId=c03525470

You may try to get the server to its Default Configuration from BIOS or System Maintenance Switch (S6).

Before doing so, make sure you have a valid backup of your data on the server.


Thank You!
I am an HPE Employee

Was the post useful? Click on the white KUDOS! Thumb below.  Kudos is a way of saying thank you to a post.
Useful Links for ProLiant Servers

Accept or Kudo

GioP
Occasional Advisor

Re: G8 DL360E NMI issue

Thanks for the advice. The unit is no longer under warranty or service agreement. Will they still help? Thank you

GioP
Occasional Advisor

Re: G8 DL360E NMI issue

I looked at that article but unfortunately I cannot generate an advanced report as I cannot get to intelligent provisioning with the riser board in (I get a red screen)

 

Also I have reset the Bios many times, is it any different if I use the s6 switch?

Thanks so much

Suman_1978
HPE Pro

Re: G8 DL360E NMI issue

Hi,

Using S6 switch is hardware based reset.

If the server is out of warranty, you may be charged for service.

You may refer to the Gen8 Troubleshooting guides below for more information.
https://support.hpe.com/hpsc/doc/public/display?docId=c03257154
https://support.hpe.com/hpsc/doc/public/display?docId=c03230516

As per your troubleshooting, if the Riser is not working on this server, I suspect a system board fault.


Thank You!
I am an HPE Employee

Was the post useful? Click on the white KUDOS! Thumb below.  Kudos is a way of saying thank you to a post.
Useful Links for ProLiant Servers

Accept or Kudo

GioP
Occasional Advisor

Re: G8 DL360E NMI issue

Thank you, I just neeed someone to confirm that I was not missing something simple or a well known bug. I suspected the system board too, so I have now ordered a "new" one on ebay. I will report back once it arrives and I change it over.

GioP
Occasional Advisor

Re: G8 DL360E NMI issue

So I have put in a new system board, and no more NMI! what do you know. It was all going well and then I started doing updates. No issue with ILO update or ROM update from USB, however that is it!

I cannot launh intelligent provisioning as the bar gets stuck loading at what looks around 10%, and the SPP pack throws a kernel panic and stops. I have just now tried to restore Intelligent provisioning with the uypdate dvd and this is also stuck without loading. It seems that any of the linux based stuff will not progress.

The machine did throw some 207 ram errors before hand but after some googling, I have reseated the CPU and they seem to have gone away.

At this stage, I am beginning to think that something is wrong with the CPUs. but I have tried both of them in one at a time, is it possible that they are both buggered and giving me the same error?

If not, it looks like I have another dud board....

UPDATE: after one last bout of desperation (I have easliy spent over 50 hours on this) i reset the RBSU and all the above mentioned issues went away. I am certain I reset the RBSU as soon as  I plugged the new board in, all I did afterwards was have both the b120i and b320i enabled in the bios..it may not have liked that!