ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

DL380 G7 Uncorrectable Machine Check Exception

 
Nicolai Rasmussen
Regular Advisor

DL380 G7 Uncorrectable Machine Check Exception

We have a bunch of new DL380 G7, and so far we've seen this happen on 3 of them:

Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000001, Bank 0x00000005, Status 0xF2000000?, Address 0x00000000?, Misc 0x00000000?)

Running Windows Server 2008 R2. This error causes the machine to reboot. The server is idling, as it has not yet been put into production. We have the Hyper-V role installed, but no VMs yet.
19 REPLIES

Re: DL380 G7 Uncorrectable Machine Check Exception

Hi,


RESOLUTION
Ensure that only one NIC port is enabled for PXE boot.

Since the default System ROM setting has only one port enabled for PXE boot, perform the following if additional ports were enabled:

Reboot the server. During POST, press F9 to enter RBSU.
Select "System Options."
Select "Embedded NICs" and select only one NIC for "Network Boot." Ensure that the other NICs are set to "disabled" (that this does not disable the device; it only removes it from being available to PXE boot).


That will also fix the problem :) Remember to assign points to answers that helped in your problem. So the forum stays alive...
Kind Regards,
Erdogan.
I am HPE Employee

If this helps you with your issue, please click the thumb to register a Kudo.
If it resolves the issue, please consider marking it as an Accepted Solution.
The comments in this post are my own and do not represent an official reply from the company. No warranty or guarantees of any kind are expressed in my reply.
Nicolai Rasmussen
Regular Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

I appreciate your suggestion, but I have already read the advisory that you are referring to myself. Perhaps I was a bit vague in my description of the problem, but the PXE boot issue is only an issue IF/WHEN you try to PXE boot, and multiple NICS are enabled for PXE boot. I'm not trying to PXE boot, nor have I changed the default setting (only 1 nic enabled for PXE boot). - I'm simply trying to keep my servers from spontaniously rebooting :)

If others find this (un)helpful, here's the advisory that I'm referring to:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=466340&prodTypeId=12169&objectID=c02251106
Nicolai Rasmussen
Regular Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

Sorry, wrong link :P

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02039369〈=en&cc=us&taskId=135&prodSeriesId=3794183&prodTypeId=329290
Jan Soska
Honored Contributor

Re: DL380 G7 Uncorrectable Machine Check Exception

Hello Nicolai - is there any difference between setup of 3 "bad" and other dl380g7's ?

jan
Nicolai Rasmussen
Regular Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

Hi Jan,

no they are all setup using the same procedures. Same OS and same bios settings.
We don't change the default bios settings, since all virtualization features is enabled per default. All servers have been upgraded to the latest bios version and none of them have any PCI cards installed.
Jan Soska
Honored Contributor

Re: DL380 G7 Uncorrectable Machine Check Exception

Hmm, could you more for testing purposes system drives from 1 bad to 1 good? If good one becomes bad - there is definetelly problem in your system OS config as only drives very changed... If issue stays on bad one, there is hw problem and contact HP?

Jan
Nicolai Rasmussen
Regular Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

We've replaced the motherboards on the faulty servers and we have not seen the errors since.
Nicolai Rasmussen
Regular Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

Nicolai Rasmussen
Regular Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

Firmware 2010.08.16 fix this issue. It was - at we at some point expected - CPU related. The latest microcode from Intel fixed it.

- It appears that this is NOT the finale fix for this issue. We still have servers with UMCE errors after the bios upgrade. Two servers had the motherboard replaced, and it hasn't happened on them since, so I would go with that solution for now...
Rico,Shen
Occasional Visitor

Re: DL380 G7 Uncorrectable Machine Check Exception

Hi,Nicolai:

I have encounter the same issue!DL380G7 2008R2 with Hyper-v installed.The server will reset unexpectedly.

Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000001, Bank 0x00000002, Status 0xB2000000'00030005, Address 0x00000000'00000000, Misc 0x00000000'00000000)

The Serial Number is:sgh046----
P/N:583914-B21

There is a quad-port NC364T network adapter inserted on PCI slot2.

Since you have provide the BIOS Version: 2010.08.16 (10 Sep 2010),I have checked that the latest BIOS Version is:Version: 2010.12.01 (18 Dec 2010)

Link:
http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=4091412&swItem=MTX-eef151e7426f4a2590f4cb8995&prodNameId=4091432&swEnvOID=4064&swLang=8&taskId=135&mode=5

I will upgrade the BIOS to 2010.12.01 (18 Dec 2010)and will update the problem.
Rico,Shen
Occasional Visitor

Re: DL380 G7 Uncorrectable Machine Check Exception

Hi,Nicolai:
After update the latest BIOS from hp website,the problem still exist.

Finally we changed the mother board and that fix the problem.
Nicolai Rasmussen
Regular Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

The workaround from HP to "fix" this is to change the Power Regulator mode to Static High Performance:

Go in to the bios and change:

Power Management Options-> HP Power Regulator -> set to â HP Static High Performance Modeâ (The HP Power Profile will automatically follow)

We have an open case with HP regarding this issue. They've not yet found the root cause, but so far the workaround is keeping our platform stable.
fmags24
Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

What Intel Processor is everyone seeing this issue with? We have the Intel® Xeon® Processor L5640 (2.26 GHz, 12MB L3 Cache, 60W, DDR3-1333, HT, Turbo) We have received the same work around from HP and I was just wondering if the same issue is happening on the E and X series of this processor. Thanks.
Nicolai Rasmussen
Regular Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

We use the L5640 aswell. We've sent two "faulty" cpus to HP for closer examination.
James Kennedy_4
Trusted Contributor

Re: DL380 G7 Uncorrectable Machine Check Exception

We are still having this issue on two of our DL380 G7 servers.

We've upgraded the firmware to the latest. Also changed Power Regulator settings in the BIOS as recommended by Nicolai.

Is anyone else still having the problem?
M. Meckel
Occasional Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

Hi there,

i encountered the same problem with slightly different error messages twice, even with newest BIOS (01/30/2011).

Please refer to my thread at

http://h30499.www3.hp.com/t5/ProLiant-Servers-ML-DL-SL/DL380G7-Uncorrectable-Machine-Check-Exception/m-p/4766324#M110517


which has ... surprise surprise ... the same title as yours :)

Nicolai Rasmussen
Regular Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

Just to give you guys an update on this LONG running case.

We still have an open case with HP about this, and supposedly it's been elevated to the highest level.

Just to clarify again, the work around is to change the power profile. IT IS NOT ENOUGH TO CHANGE THE POWER REGULATOR MODE, via the iLO interface. - You HAVE to change it in bios.

We've sent a complete server to HP for "testing" now. I doubt anything is going to come out of that though. Meanwhile we now have 40+ servers running on Static High, which mean they consume about 1/3 of extra juice and have been doing so for almost half a year.

I wonder if my HP account manager is going to accept the invoice that I will be sending him, to pay for the extra power consumption :P

HPs new servers are very green!.......
Nicolai Rasmussen
Regular Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

Quick update:

After 10 months of waiting, HP has finally released an advisory on this issue:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02829424&jumpid=em_alerts_us-us_May11_xbu_all_all_1266624_80534_proliantservers_critical_013_0

I'm beeing told that they are expecting a bios fix for this issue. Release date is estimated to end of june.
Server-Support
Super Advisor

Re: DL380 G7 Uncorrectable Machine Check Exception

@Nicolai Rasmussen 

yes, same here. my HP BL 465c G7 blade servers which was running for more than 1.5 years has just rebooted today during the business hours.

Here's the IML logs:

Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000010, Bank 0x00000004, Status 0xF2000000'00070F0F, Address 0x00000000'00000000, Misc 0x00000000'00000000)
Uncorrectable Chipset Error (Error status 1 0x0018C154, Error status 2 0x00244000)
Uncorrectable Chipset Error (Error status 1 0x0018C160, Error status 2 0x00002040)
Uncorrectable Chipset Error (Error status 1 0x0018C16C, Error status 2 0x20000080)
Uncorrectable Chipset Error (Error status 1 0x0018C170, Error status 2 0x040406FF)
Uncorrectable Chipset Error (Error status 1 0x0018C174, Error status 2 0x00000003)
Uncorrectable Chipset Error (Error status 1 0x0018C178, Error status 2 0x9452EA00)

My Server ROM is on A19 12/08/2012 but according to http://h20565.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c03250482 The system ROM dated 12.31.2011 corrects this issue which is older ?