ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Ucorrectable MCE on HP Proliant Dl 380 g9

 
igork
Occasional Advisor

Ucorrectable MCE on HP Proliant Dl 380 g9

Hi,

Our company has 4 HP Proliant DL380 gen9 servers.

Once per week every server randomly reboots, after receiving an Uncorrectable machine check exception error.

Integrated management log:

Uncorrectable machine check exception error (Board 0, Processor 1, APIC ID 0x00000000, Bank 0x00000003, Status 0xF200000000300101, Address 0x0000000000000000, Misc 0x0000000000000000).

Uncorrectable machine check exception error (Board 0, Processor 1, APIC ID 0x00000000, Bank 0x00000003, Status 0xF200000000300101, Address 0x0000000092C0002C, Misc 0x0000000000000C85).

Uncorrectable machine check exception error (Board 0, Processor 1, APIC ID 0x00000000, Bank 0x00000003, Status 0xF200000000300189, Address 0x0000000000000000, Misc 0x0000000000000000).

Uncorrectable machine check exception error (Board 0, Processor 1, APIC ID 0x00000000, Bank 0x00000003, Status 0xF200000000300179, Address 0x0000000000000000, Misc 0x0000000000000000).

Tried a lot of recomendations, but nothing helped to resolve this problem.

 

Configuration:

CPU: 2 x Intel Xeon E5 v3

RAM : 64Gb

BIOS version 2.40.

(legacy mode, Max performance profile, C, P states disabled).

 

Best regards,

Igor

11 REPLIES

Re: Ucorrectable MCE on HP Proliant Dl 380 g9

Hi,

I think problem uncorrectable Memory Error on the DIMM3 of the processor 1.

Check the dimm3 and lastest spp update.

Kind Regards,
Erdogan.
I am HPE Employee

If this helps you with your issue, please click the thumb to register a Kudo.
If it resolves the issue, please consider marking it as an Accepted Solution.
The comments in this post are my own and do not represent an official reply from the company. No warranty or guarantees of any kind are expressed in my reply.
igork
Occasional Advisor

Re: Ucorrectable MCE on HP Proliant Dl 380 g9

Hi,

Thank's for reply.

I'll check RAM with memtest.

But it's quite strange, that these error's occur on 4 servers.

igork
Occasional Advisor

Re: Ucorrectable MCE on HP Proliant Dl 380 g9

MCE's from another server:

Uncorrectable machine check exception error (Board 0, Processor 2, APIC ID 0x00000010, Bank 0x00000003, Status 0xF200000000300101, Address 0x0000000000000000, Misc 0x0000000000000000).

Uncorrectable machine check exception error (Board 0, Processor 2 APIC ID 0x00000010, Bank 0x00000003, Status 0xF200000000300101, Address 0x0000000092C0002C, Misc 0x0000000000000C85).

Here DIMM is also 3, but processor 2.

Re: Ucorrectable MCE on HP Proliant Dl 380 g9

Hi,

MemTest will not give correct results. Can you case create with HPE call centre and analyze the ahs log.

Kind Regards,
Erdogan.
I am HPE Employee

If this helps you with your issue, please click the thumb to register a Kudo.
If it resolves the issue, please consider marking it as an Accepted Solution.
The comments in this post are my own and do not represent an official reply from the company. No warranty or guarantees of any kind are expressed in my reply.
igork
Occasional Advisor

Re: Ucorrectable MCE on HP Proliant Dl 380 g9

Today, I'll get the AHS log and send it to HPE support.

Thank you,

Erdogan Temur

PiterParker
Valued Contributor

Re: Ucorrectable MCE on HP Proliant Dl 380 g9

Good morning sir, 

In this case I would follow below procedure. 

- Do the memtest in HPE Diagnostics

- If it will not showing any results update SPP

- Create health log .ahs and run the case with HPE. 

Please let us know about the progress. 

igork
Occasional Advisor

Re: Ucorrectable MCE on HP Proliant Dl 380 g9

Hi,

Thank's for advice. I'll let you know.

igork
Occasional Advisor

Re: Ucorrectable MCE on HP Proliant Dl 380 g9

Hi,

After analyzing the AHS log, HP Customer service made a conclusion, that the problem is in Smart storage battery.

Quite strange.... This problem occurs randomly on four servers. 4 batteries are broken?

They provided us four Stmart storage batteries for replacement.

igork
Occasional Advisor

Re: Ucorrectable MCE on HP Proliant Dl 380 g9

Seems, battery replacement didn't resolve the problem.

Server rebooted again.

igork
Occasional Advisor

Re: Ucorrectable MCE on HP Proliant Dl 380 g9

Hi,.

Seem's the Hyperthreading or x2APIC technology is causing random reboot's.

After disabling them, 2 days under heavy load test without any reboot's.

With enabled nearly 11 reboot's with umce's per day.

Buddika2017
Occasional Contributor

Re: Ucorrectable MCE on HP Proliant Dl 380 g9

Action Plan:

*I suggest you to update the SPP which will take care of all the driver/firmware to the latest version.

SPP: http://h20564.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=5177954&swItemId=MTX_3f6b4074ed734dc3baf007612d&swEnvOid=4103

*need to configure memory as per configuration rules

*Swap DIMMS  with know good parts( same Hp Sapre number) and check

*replace processor 1 with HPE recommendation & check  (It seems like a Processor 1 failure)

Thank you,
Best Regards,
Buddika Sandaruwan