1752783 Members
5847 Online
108789 Solutions
New Discussion юеВ

cpqasm2

 
Jarrett C
New Member

cpqasm2

I am getting the following description from a warning in the event logs: "Memory module #3 has exceeded its threshold of correctable errors. Subsequent correctable memory errors will continue to be corrected, but memory error reporting will cease until the system is rebooted." My system has rebooted the last three nights in a row and this is the only thing that I see in the event logs. Can someone please tell me what this means.

Thanks.
2 REPLIES 2
kris rombauts
Honored Contributor

Re: cpqasm2

Jarret,

several components in a system have their thresholds, for memory modules in particular, their is a hard coded threshold by the manufacturer that, when crossed, is believed to be the time to alert the user that it's now time to replace the component in order to avoid unplanned downtime in the (near) future.

So if the module 3 has experienced to many correctable bit errors (because some chips on the SIMM/DIMM have a problem, then their is a chance that in the future you'll see crashes on your machine as the number of bit errors can grown and at some point in time become a double bit error which the system can not tolerate (or data corrutpion occurs) and it will then generate a crash and the system will go down immediately.

What probably happens here is that the module generates so many errors that you get the same message again after a reboot because the threshold is reached again so rapidly.

I'd make sure that module 3 gets replaced.


HTH

Kris


Marc Carney
Valued Contributor

Re: cpqasm2

Agreed, swap module 3 but make sure you know which is module 3 before switching off and opening up the server. It's not always obvious. Take a look at the 'Maintenance and Service Guide' for your specific model. That should show you how the memory boards are laid out. A little planning is often a good investment.
The sheep tell me what I need to know