Operating System - Tru64 Unix
cancel
Showing results for 
Search instead for 
Did you mean: 

Dcache ECC error on cpu0

mmain_1
Occasional Advisor

Dcache ECC error on cpu0

I administrate a Compaq Alpha Server ES40 with four EV67 CPUs 666Mhz
with
8 GByte RAM with the following memory organisation:


0 2048Mb 0000000000000000 4-Way
1 2048Mb 0000000080000000 4-Way
2 2048Mb 0000000100000000 4-Way
3 2048Mb 0000000180000000 4-Way


After one year of running a few weeks ago the following messages appear
in the syslog:


Jul 22 16:12:15 ragnaroek vmunix: WARNING: too many Processor corrected
errors detected on cpu 2. Reporting suspended.
Jul 22 16:13:27 ragnaroek vmunix: WARNING: too many Processor corrected
errors detected on cpu 1. Reporting suspended.
Jul 22 16:14:18 ragnaroek vmunix: WARNING: too many Processor corrected
errors detected on cpu 3. Reporting suspended.


and a few hours later the machine goes to the console prompt.


During the memory test the following messages appear:


EV6 Correctable Memory Fill ECC Error on CPU 0
C_ADDR: 00000000A8FC5BC0
C_SYNDROME_1: 0000000000000057
C_SYNDROME_0: 0000000000000000


EV6 Correctable Dcache ECC Error on CPU 0


EV6 Correctable Memory Fill ECC Error on CPU 0
C_ADDR: 00000000A8FD2BC0
C_SYNDROME_1: 0000000000000057
C_SYNDROME_0: 0000000000000000


First, I thought, it's an defect DRAM module, located in bank 1 because
of the
C_ADDR information. But after removing bank the error still occurs.


So, my question, it is a memory or CPU problem, and, if it's a memoery
problem,
how can I determine the defect DRAM Chip? I haven't found any suitable
documentation.
4 REPLIES
Michael Schulte zur Sur
Honored Contributor

Re: Dcache ECC error on cpu0

Hi,

it may be a cpu fault as well. If you do not have a maintenance contract, you should replace cpu 0 with cpu 3 and restart the machine.

greetings,

Michael
amrelsayed
Frequent Advisor

Re: Dcache ECC error on cpu0

hi mmain,

your problem is memory, but to determine which dimm, please send to me the binary.errlog file of your ES40 Machine.

My Email address: aelsayed@ncs.com.kw

Best regards,
Amr
Try To Be Smart
mmain_1
Occasional Advisor

Re: Dcache ECC error on cpu0

dear all:

problem is member bad!
thank!!!
Michael Schulte zur Sur
Honored Contributor

Re: Dcache ECC error on cpu0

What do you mean by member is bad?

Michael