1832890 Members
2026 Online
110048 Solutions
New Discussion

EMS Serial errors

 
SOLVED
Go to solution
Philemon_2
Frequent Advisor

EMS Serial errors

Hi Experts,

We have several serious errors being reported by the Operating System on one of our server. Please look at the below Event Monitoring Service (EMS) reported on Sun Sep 28 20:25:01 2008. at the end you will fing the status of memory. please help in finding the issue and fixing it. Thank you!


Sun Sep 28 20:25:01 2008
Summary:
Firmware detected excessive errors on the DIMM.
Description of Error:
The DIMM at the physical location given by the data field had excessive errors and has been marked as "FAILED" by firmware.
Probable Cause / Recommended Action:
Firmware detected excessive errors on the DIMM / Replace the specified DIMM

Sun Sep 28 20:25:02 2008
Summary:
A dimm or CPU has is deconfigured or failed testing
Description of Error:
A dimm or CPU has failed and is not operational for the system. This event is emmitted prior to determining if the cell should be integrated into the partition.
Probable Cause / Recommended Action:
A deconfigured dimm or cpu has been detected. Examine earlier events to isolate the problem.

Sun Sep 28 20:40:02 2008
Summary:
Uncorrectable multiple-bit ECC error in DIMM
Description of Error:
A series of MEM_MBE_IN_RANK chassis code will be sent when an error is detected by firmware. The data fields will contain information on the error based on the data type as follows: Physical Location - location of dimm(s) in error; Physical Address - address affect by error; Syndrome - Firmware generated ECC
syndrome that can be used to further isolate the error.
Probable Cause / Recommended Action:
Multiple bit error in DIMM
Contact HP Support personnel to troubleshoot the problem

Sun Sep 28 20:41:44 2008
Summary:
Cannot add PDT entry-PDT full
Description of Error:
The memory page deallocation table (PDT) is full.
Probable Cause / Recommended Action:
Excessive memory errors
Contact HP Support personnel to troubleshoot the problem

Sun Sep 28 20:41:45 2008
Summary:
Firmware encountered a problem trying to initialize
Description of Error:
System firmware encountered an error while trying to perform an operation during system initialization. This event ID will always be emmitted before an
event ID that describes the status of the operation that failed.
Probable Cause / Recommended Action:
Examine the related event that failed and correct that problem.

Sun Sep 28 20:41:45 2008
Summary:
The Error Response Mode has been determined
Description of Error:
Get Error Response Mode has been called. The first 8 bytes of the response mode string are displayed in the data field and must be converted to ascii from the
hex values.
Probable Cause / Recommended Action:
Decode the hex vales to ascii to determine the mode. Other errors will determine action.

Sun Sep 28 20:41:45 2008
Summary:
Machine Check initiated
Description of Error:
A Machine Check has been initiated
Probable Cause / Recommended Action:
A Machine Check has occurred. Analyze cause of Machine Check using diag's and EFI tools.

Sun Sep 28 20:46:46 2008
Summary:
It indicates loss of cell connectivity in the partition.
Description of Error:
It indicates loss of cell connectivity in the partition during a global MCA processing.
Probable Cause / Recommended Action:
It will lead to cells performing RESET_FOR_RECONFIG after getting the error logs.

Sun Sep 28 20:46:46 2008
Summary:
FW will not handoff to the OS_MCA handler for this MCA event
Description of Error:
This means that the system FW MCA handler is not going to handoff to the OS_MCA handler.
Probable Cause / Recommended Action:
The error logs should be retrieved from the EFI shell prompt.

Sun Sep 28 20:46:46 2008
Summary:
Forward Progress is stopping. The Cell or System will not boot further.
Description of Error:
System Firmware has determined that cell or system progress must be halted. The data field contains the Instruction Pointer of the function that called for the
halt. The second instance of this code being emitted indicates the major statein system change. This code must be emmitted in pairs.
Probable Cause / Recommended Action:
An error occurred which triggered system firmware to cease making forward progress. The CPU is put into a spin loop so that external debugging can take
place. See earlier event ids to help determine the cause of the error. Also note that the Error Response Mode is likely to have directed firmware to HALT.

Sun Sep 28 20:56:47 2008
Summary:
The cell is not able to reach all requested cells through the fabric.
Description of Error:
The cell was not able to reach all the other cells in its configured set through the fabric. The data field contains the bitmask of actual cells that were reached.
Probable Cause / Recommended Action:
Fabric wasn't able to route to all cells described in the complex profile correctly due to a hardware problem. Some of the cells are unreachable. Update
the complex profile or correct the hardware problem.

Sun Sep 28 20:56:47 2008
Summary:
The PD cannot boot, a majority of cells did not arrive at Rendezvous
Description of Error:
Not enough cells made the Rendezvous for boot to continue. The rules are listed in the cause action section.
Probable Cause / Recommended Action:
PD Rendezvous Boot Rules: If greater than 50% of the assigned cells are rendezvoused, we will boot. If less than 50% of the assigned cells are rendezvoused, don't boot. If exactly 50% of the assigned cells are rendezvoused, including all of the preferred core cells, we will boot. If exactly 50% have rendezvoused, and there is a specified preferred core cell not rendezvoused, don't boot. If exactly 50% have rendezvoused, and there are no preferred core cells, don't boot. If any of the above apply in preventing the boot. Reconfigure the PD and reboot.

Sun Sep 28 20:56:47 2008
Summary:
Firmware is preparing to reset for reconfiguration.
Description of Error:
System firmware has detected a condition that requires the cell to be reset for reconfiguration. The function has been called and is now executing. Data field
contains the cell number being reset.
Probable Cause / Recommended Action:
This can be caused by many conditions including a bad complex profile, a bad hardware configuration, a cell arriving late to the rendezvous point. A cell
not being able to rendezvous. Reconfiguration from partition manager is recommended.


Present Memory Status

-- Information Tool Log for IPF_MEMORY on path memory --

Log creation time: Tue Sep 30 14:45:54 2008

Hardware path: memory

Basic Memory Description

Module Type: MEMORY
Page Size: 4096 Bytes
Total Physical Memory: 32768 MB
Total Configured Memory: 32768 MB
Total Deconfigured Memory: 0 MB

Memory Board Inventory

DIMM Location Size(MB) State Serial Num Part Num
-------------------- -------- ------- ---------------- ------------------
Cab 0 Cell 0 DIMM 0A 1024 Config PRY06110MG A9843-60301
Cab 0 Cell 0 DIMM 0B 1024 Config PRY06110W6 A9843-60301
Cab 0 Cell 0 DIMM 1A 1024 Config PRY06110W7 A9843-60301
Cab 0 Cell 0 DIMM 1B 1024 Config PRY06110H2 A9843-60301
Cab 0 Cell 0 DIMM 2A 1024 Config PRY06110PL A9843-60301
Cab 0 Cell 0 DIMM 2B 1024 Config PRY06110H3 A9843-60301
Cab 0 Cell 0 DIMM 3A 1024 Config PRY06110MB A9843-60301
Cab 0 Cell 0 DIMM 3B 1024 Config PRY06110NT A9843-60301

Cab 0 Cell 0 Total: 8192 (MB)

===========================================================================

DIMM Location Size(MB) State Serial Num Part Num
-------------------- -------- ------- ---------------- ------------------
Cab 0 Cell 1 DIMM 0A 1024 Config PRY06110E3 A9843-60301
Cab 0 Cell 1 DIMM 0B 1024 Config PRY06110PR A9843-60301
Cab 0 Cell 1 DIMM 1A 1024 Config PRY06110MA A9843-60301
Cab 0 Cell 1 DIMM 1B 1024 Config PRY061108H A9843-60301
Cab 0 Cell 1 DIMM 2A 1024 Config PRY06110CY A9843-60301
Cab 0 Cell 1 DIMM 2B 1024 Config PRY06110EJ A9843-60301
Cab 0 Cell 1 DIMM 3A 1024 Config PRY0611028 A9843-60301
Cab 0 Cell 1 DIMM 3B 1024 Config PRY06110E2 A9843-60301

Cab 0 Cell 1 Total: 8192 (MB)

===========================================================================

DIMM Location Size(MB) State Serial Num Part Num
-------------------- -------- ------- ---------------- ------------------
Cab 0 Cell 2 DIMM 0A 1024 Config PRY061101S A9843-60301
Cab 0 Cell 2 DIMM 0B 1024 Config PRY0611104 A9843-60301
Cab 0 Cell 2 DIMM 1A 1024 Config PRY061102P A9843-60301
Cab 0 Cell 2 DIMM 1B 1024 Config PRY061026U A9843-60301
Cab 0 Cell 2 DIMM 2A 1024 Config PRY06110G5 A9843-60301
Cab 0 Cell 2 DIMM 2B 1024 Config PRY06110G6 A9843-60301
Cab 0 Cell 2 DIMM 3A 1024 Config PRY06110ZK A9843-60301
Cab 0 Cell 2 DIMM 3B 1024 Config PRY06110G3 A9843-60301

Cab 0 Cell 2 Total: 8192 (MB)

===========================================================================

DIMM Location Size(MB) State Serial Num Part Num
-------------------- -------- ------- ---------------- ------------------
Cab 0 Cell 3 DIMM 0A 1024 Config PRY06110C8 A9843-60301
Cab 0 Cell 3 DIMM 0B 1024 Config PRY06110G2 A9843-60301
Cab 0 Cell 3 DIMM 1A 1024 Config PRY06110ZJ A9843-60301
Cab 0 Cell 3 DIMM 1B 1024 Config PRY06110DX A9843-60301
Cab 0 Cell 3 DIMM 2A 1024 Config PRY0611022 A9843-60301
Cab 0 Cell 3 DIMM 2B 1024 Config PRY06110E0 A9843-60301
Cab 0 Cell 3 DIMM 3A 1024 Config PRY06110EW A9843-60301
Cab 0 Cell 3 DIMM 3B 1024 Config PRY06110E1 A9843-60301

Cab 0 Cell 3 Total: 8192 (MB)

===========================================================================

Memory Error Log Summary

The memory error log is empty.

Page Deallocation Table (PDT)

The Page Deallocation Table is empty.

PDT Entries Used: 0
PDT Entries Free: 800
PDT Total Size: 800

-- Information Tool Log for IPF_MEMORY on path memory --


Thanks,
Philemon
4 REPLIES 4
Tim Nelson
Honored Contributor
Solution

Re: EMS Serial errors

Is the question that something is broke ? Looks like a DIMM failure, call support for replacement.

Or is the question, " how to figure out which DIMM is bad" so you can fix it yourself ?
Sandeep_Chaudhary
Trusted Contributor

Re: EMS Serial errors

DIMM failure. Its having multiple bit errors. Log call with support and get DIMM replaced.
Torsten.
Acclaimed Contributor

Re: EMS Serial errors

You didn't sent all required information.

Where is the location of the problematic memory?

Contact HP support!

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
SKR_1
Trusted Contributor

Re: EMS Serial errors

Everything looks ok at the moment.
If you are receiving these alerts frequently, need to contact HP Support for any replacement of the hardware ( memory/CPU )

Thanks

SKR