ProLiant Servers (ML,DL,SL)
1753499 Members
4852 Online
108794 Solutions
New Discussion юеВ

Re: DL585 G2: Correctable Error Threshold exceeded questions

 
Jake Mason
Advisor

DL585 G2: Correctable Error Threshold exceeded questions

So my work just got in some DL585 G2 systems and I came across something I've never seen before. I boot up Smart Start and a couple of the DIMMs end up having a "Correctable Error Threshold Exceeded" error. But then I heard that sometimes this is actually a CPU issue, not memory. So then I installed working CPUs and the memory errors went away. My question is: "How does that work?"
3 REPLIES 3
SMR
Valued Contributor

Re: DL585 G2: Correctable Error Threshold exceeded questions

Hi,

Collect a survey report from Insight Diagnostics, select advanced view level (using smart start, not the online version), look at the individuals DIMMs... they should still have their respective error counts + the message Correctable Error Threshold exceeded. Basically if the DIMMs are green now, that does not mean they never had an ecc error, the survey report shows you if they ever experienced errors.

Replace dimms which have exceeded their thresholds.

We have a bunch of old 360G4p's for testing/training that have truck loads of bad dimms, I simply reseat the modules and the keep working, however I know the dimms are fubar.

Never heard of an ECC error being caused by a bad CPU.

I hope that helps!
Jake Mason
Advisor

Re: DL585 G2: Correctable Error Threshold exceeded questions

Yeah I haven't either, which is why I was curious. Idk if it has something to do with the AMD procs but I've never seen this on an intel proc, let alone ever on any other server. I have had this happen a couple times now in my teching experiences only on these servers.
SMR
Valued Contributor

Re: DL585 G2: Correctable Error Threshold exceeded questions

I would confirm that the dimms are the problem (and not the slots) by writing down the error count... swap them around and check the error count again some days later. Of course that's if you're lucky enough to have the error show up again soon. Other than that I would replace the dimms.