Servers - General
1843570 Members
2602 Online
110219 Solutions
New Discussion

Re: DL360 Gen10 – CPU 2 “Uncorrectable Machine Check Exception / UPI Error” – Need Advice

 
Genc
Occasional Contributor

DL360 Gen10 – CPU 2 “Uncorrectable Machine Check Exception / UPI Error” – Need Advice

DL360 Gen10 – CPU 2 “Uncorrectable Machine Check Exception / UPI Error” – Need Advice (War

Hello everyone,
I am experiencing a recurring but rare hardware error on my HPE ProLiant DL360 Gen10, and since the warranty is expired, I would appreciate community insight before considering paid repair.


Server Details

  • Model: HPE ProLiant DL360 Gen10 (8SFF CTO)
    iLO Version: iLO 5 v2.81 (Mar 07 2023)
    Host OS: VMware ESXi
    CPU: Intel Xeon (2-CPU configuration)

    Issue Summary

    The server experienced several unexpected shutdowns / reboots.
    In the Integrated Management Log (IML) I see repeated CPU-related hardware faults:

    Key errors from IML:

     

     
    Uncorrectable UPI Error was detected on Processor 2 Uncorrectable Machine Check Exception (Processor 2) Unexpected Shutdown and Restart – An undetermined error type resulted in a reboot of the server

    Uncorrectable UPI Error was detected on Processor 2
    Uncorrectable Machine Check Exception (Processor 2)
    Unexpected Shutdown and Restart – An undetermined error type resulted in a reboot of the server
     

    Event Examples:

    • 06/06/2024 16:24:20 – Uncorrectable Machine Check Exception (Processor 2)

    • 06/06/2024 16:24:21 – Uncorrectable UPI Error detected on Processor 2

    • 11/19/2025 13:45:49 – Another Machine Check Exception on Processor 2

    • Several entries of “Unexpected Shutdown and Restart” between 11/16/2025–11/27/2025

    These faults occurred only two times in two years, but each one forced a reboot.




    Additional observations From iLO → System Health

    All subsystems show OK, including:

    • Power

    • Fans

    • Storage

    • Network

    • Processors

    • Temperatures

    • Smart Storage Energy Pack

    From ESXi vmkernel.log

    The time of the reboot does not show any obvious ESXi cause.

    Memory

    The IML contains messages like:

    Processor 1/2 DIMM X could not be authenticated as genuine HPE SmartMemory.
    Enhanced + extended memory features will not be active.

     
    Processor 1/2 DIMM X could not be authenticated as genuine HPE SmartMemory. Enhanced + extended memory features will not be active.

    (Not sure if relevant to UPI or MCE errors.)

    Since HPE contract has expired, I would like your expert opinion:

    Is this a known issue on DL360 Gen10?

    Common causes I found online include:

    • CPU not fully seated

    • Weak contact in the LGA socket

    • Dried thermal paste

    • Firmware mismatch / old microcode

    • PSU voltage instability

    • DIMM channel instability on CPU 2

    • Failing CPU (rare)

     What would you try first?

    Firmware update?
    CPU reseating?
    Thermal paste replacement?
    DIMM reseating on CPU2?
    Disable C-States?

    Since this only happened twice in 2 years, does it point to a failing CPU, or a minor stability issue? Is SMARTMEMORY authentication failure related to UPI/MCE errors, or completely independent

    Integrated Management Log – Critical Entries

    These are the repeated fault messages:

    CPU / UPI Errors

    Event 1057 – Uncorrectable UPI Error was detected on Processor 2
    Event 1056 – Uncorrectable Machine Check Exception (Processor 2, APIC ID 0x00000020, Bank 0x0000000C)
    Event 821 – Uncorrectable UPI Error detected on Processor 2
    Event 820 – Uncorrectable Machine Check Exception on Processor 2


    Unexpected Shutdowns / Restarts


    Event 1109 – Unexpected Shutdown and Restart – An undetermined error type resulted in a reboot
    Event 1083 – Unexpected Shutdown and Restart
    Event 1030 – Unexpected Shutdown and Restart
    Event 1004 – Unexpected Shutdown and Restart
    Event 978 – Unexpected Shutdown and Restart
    Event 949 – Unexpected Shutdown and Restart


    Memory Authentication Warnings (may be unrelated)

    Multiple entries like:

    Processor X DIMM Y could not be authenticated as genuine HPE SmartMemory.
    Enhanced and extended SmartMemory features will not be active.

  • Storage Controller Logs
    Slot 0 – Controller Write Cache status changed to 

2 REPLIES 2
Mamatha_J
HPE Pro

Re: DL360 Gen10 – CPU 2 “Uncorrectable Machine Check Exception / UPI Error” – Need Advice

Hi @Genc 

1. Update System ROM / BIOS / Microcode / iLO / Firmware to latest
According to the Release Notes for the Service Pack for ProLiant (SPP) for your server: the latest System ROM for DL360 Gen10 (U32) includes updated Intel microcode that provides a fix for a potential machine check exception under heavy stress with short loops of instructions.
https://downloads.hpe.com/pub/softlib2/software1/publishable-catalog/p202943209/v138521/SPP2021.05.0ComponentNotes.pdf

HPE explicitly lists Update the system firmware as the first action for certain “Bank 3/4 Uncorrectable Machine Check Exception” issues on Gen10.
https://support.hpe.com/hpesc/public/docDisplay?docId=a00126841en_us&docLocale=en_US

So, get the latest firmware via SPP; update BIOS, iLO firmware, and microcode; then re-test under load/workload.
2. Check memory (DIMM) configuration & authenticity — reseat/replace DIMMs

HPE’s Gen10 “Mixed DIMM configurations are not supported” guidance: mixing different DIMM types (e.g., RDIMM + UDIMM) or incorrect population could cause instability.
https://support.hpe.com/hpesc/public/docDisplay?docId=ilogen12-msg-en_us&docLocale=en_US&page=class0x0032code0x0239-gen11.html

Given your DIMM could not be authenticated as genuine HPE SmartMemory warnings, this is especially suspect. Try removing and reinstalling DIMMs on CPU 2 side; if possible, test with known-good HPE-certified DIMMs.

3. Reseat / re-seat CPU 2 — check physical socket / contact

Before removing or replacing any processor, follow “Processor troubleshooting guidelines” carefully, improper handling can damage the system board.
So, power off, unplug, ground yourself, un-mount CPU2 heatsink, remove CPU, inspect socket and contacts, reseat CPU cleanly (with proper orientation), reapply correct thermal paste, re-assemble and test.

4. Test with only CPU 1 (remove CPU 2) under similar load/workload for extended time

If with only CPU1 the server becomes perfectly stable — that strongly suggests CPU2 or its associated memory/UPI link is at fault. This is a common “isolate the faulty socket/CPU” diagnostic technique.

This also reduces variables: you eliminate DIMMs, UPI links, memory, etc., so you can see whether CPU2 socket or CPU2 is truly the issue.
5. Check for problematic PCIe / Add-on cards / NVMe / storage components
6. If after all above, issue persists — plan for CPU (or CPU + system board) replacement

HPE’s “Uncorrectable Machine Check Exception” documentation states that if firmware update and correct hardware configuration doesn’t stop UMCE events, replacement of the processor may be required.
In some cases, especially persistent UPI error on CPU2, this indicates CPU or socket-link failure and only replacement will guarantee stability.
https://support.hpe.com/hpesc/public/docDisplay?docId=a00029456en_us&docLocale=en_US

Thanks & Regards,

Mamatha Ajaraddi



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Mamatha_J
HPE Pro

Re: DL360 Gen10 – CPU 2 “Uncorrectable Machine Check Exception / UPI Error” – Need Advice

Hi @Genc 

If you find the reply helpful, please consider clicking on the "Thumbs Up/Kudo" icon.

Thanks & Regards,

Mamatha Ajaraddi



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo