- Community Home
- >
- Servers and Operating Systems
- >
- HPE ProLiant
- >
- ProLiant Servers (ML,DL,SL)
- >
- DL360 G10 - (CRITICAL) Uncorrectable Machine Check...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tuesday - last edited Tuesday by support_s
Tuesday - last edited Tuesday by support_s
DL360 G10 - (CRITICAL) Uncorrectable Machine Check Exception Crash
Yesterday my 'new' DL360 G10 crashed with the following error:
HPE iLO AlertMail-006: (CRITICAL) Uncorrectable Machine Check Exception (Processor 1, APIC ID 0x00000000, Bank 0x00000004, Status 0xBA000000'58000402, Address 0x00000000'00000000, Misc 0x00000000'00000000).
The system rebooted, failed, seemed to reset some bios values, rebooted and is now running. But obviously confidence has been severely shaken waiting for it to happen again.
Did a lot of Googling, the 'obvious' response seems to be bad cpu. But hardware tests all pass, so others claim firmware issue. And when was the last time you actually had a cpu fail? Seems unlikely to me.
Coindidentally, I found this: advisery - Advisory: (Revision) HPE NVMe Solid State Drives - SSDs NVMe Models with Firmware Version MPK77H5Q, MPK7725Q, or HPK5 May Cause UMCEs on AMD and Intel-Based Gen9, Gen10, Gen10 Plus or Gen11 Servers
Certain NVMe drive models with firmware version MPK77H5Q, MPK7725Q or HPK5 may cause an Uncorrectable Machine Check Exception (UMCE) to occur. An uncorrectable PCIe bus error may also be logged in the case of Intel-based servers. These errors will be logged to the HPE Integrated Management Log (IML) and will cause the operating system to crash. The error may also cause an unexpected server reboot event.
Critical,1240,172348,0x0005,CPU,0x0003,Hardware,06/12/2024 12:05:43,40118: Uncorrectable Machine Check Exception (Processor 2, APIC ID 0x00000020, Bank 0x00000006, Status 0xBB800000'00000E0B, Address 0x00000000'00000000, Misc 0x00000000'AE100000). ACTION: Update the system firmware. If the issue persists, contact support.
I happen to have (4) 800GB SAS MO000800KXAVN drives in my system with showing HPK5 firmware. But I'll be darned trying to find this drive on the HPE website, let alone firmware for it. I did apply SPP 2025/09 in early October, which appears to be the latest SPP available. And I do not see any other firmware warnings in my HPE support account.
Any suggestions? Or where to locate FW for my MO000800KXAVN drives?
THANKS
- Tags:
- drive
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tuesday - last edited Tuesday
Tuesday - last edited Tuesday
Re: DL360 G10 - (CRITICAL) Uncorrectable Machine Check Exception Crash
Google reveals that I am not alone on this issue - Solved: DL380 Gen10 Uncorrectable PCI Express Error Detec... - Hewlett Packard Enterprise Community
While that thread is marked 'resolved', I'm not certain that is true - I am running more recent FW than the version that apparently 'resolved' it 6 years ago...
Does anyone know how to identify the components associated with Bank 0x04 ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tuesday
Tuesday
Re: DL360 G10 - (CRITICAL) Uncorrectable Machine Check Exception Crash
Hello @TomJ802,
On HPE ProLiant servers like the DL380 Gen10, a message referring to “Bank 0x04” usually points to a specific internal error-logging bank inside the system’s CPU or memory controller, not to a physical slot that is labeled the same way.
In simple terms, it means the hardware recorded an error in a particular error bank, and you need the server’s management tools to translate that into the actual component. The best way to identify the real part behind Bank 0x04 is to check the Integrated Management Log (IML) in iLO, run an HPE Active Health System (AHS) report, or use HPE Support Tools like SSA or STP.
These tools map the bank number to a PCIe device, DIMM, or CPU lane. Without that mapping, Bank 0x04 alone isn’t enough to know the component. If the error keeps repeating, generating an AHS log and giving it to HPE support is usually the fastest way to pinpoint the exact card, slot, or controller causing the PCIe error.
Regards,
Azr_geek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yesterday - last edited yesterday
yesterday - last edited yesterday
Re: DL360 G10 - (CRITICAL) Uncorrectable Machine Check Exception Crash
Thanks for the reply and the information. While I have the iLO error email, I inadvertantly deleted the IML entries - during the crash cycle it appears the bios reset itself and the most recent entry was a tamper warning for the server lid... The GUI had check boxes to the left of every entry, so I checked that one entry and clicked the delete icon - none of the other check boxes for the error entries were checked. Imagine my surprise (and frustration) when that action deleted the entire IML log. That may be user error, but I will say the GUI is not obvious in this feature.
Hopefully this crash does not happen again - though I have never encountered this with any of my older Lenovo servers and my confidence has been severely shaken... We recently purchased this DL360 for a business critical application and now I am losing sleep and checking the server status first thing every morning...
Tom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yesterday
yesterday
Re: DL360 G10 - (CRITICAL) Uncorrectable Machine Check Exception Crash
Losing the IML entries after a critical crash would unsettle anyone, especially when the server is running a business-critical workload. Unfortunately, the iLO IML delete function does wipe the entire log, even if only one box is selected. You didn’t do anything wrong — the interface really is not very clear about that, and many admins have been caught by the same behavior.
The AHS (Active Health System) keeps deeper, time-stamped hardware telemetry that survives BIOS resets and IML deletions. If this issue happens again, the AHS file will give HPE support the best chance of pinpointing the exact root cause.
A single crash after a firmware reset isn’t always a sign of ongoing failure. Sometimes a one-off PCIe machine check comes from a momentary glitch, power fluctuation, or firmware state issue, especially around updates or resets. If the system is now stable, there’s a good chance it was an isolated event.
Your hardware is still under support HPE will normally treat repeat MCE/PCIe errors very seriously. If it happens again, they can analyze the AHS logs and often identify a specific card, riser, or system board. In some cases they’ll replace parts proactively.
You’re not wrong to be worried — anyone would be — but one crash doesn’t automatically mean you have an unreliable system. If something is truly faulty, the server will usually show repeatable symptoms, and HPE is very familiar with tracking these down.
If you want, I can also summarize what to collect or check now, so that if the issue returns you have everything ready for support.
Regards,
Azr_geek