ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

How to change correctable memory error threshhold?

How to change correctable memory error threshhold?

I'm not clear what conditions trigger the little orange `bad memory' light, and the entry in the management log saying that RAM has accumulated too many correctable errors. (This is on a DL3809 G3.)

I'd like to inspect these conditions and change them. It's felt that our systems are reporting bad RAM prematurely, and we would like to raise the threshold at which they decide that RAM is bad.

Does anybody know how to do this?

I've inspected my system with SNMP, and found these variables:

CPQHLTH-MIB::cpqHeCorrMemLogStatus.0 = INTEGER: notSupported(2)
CPQHLTH-MIB::cpqHeCorrMemLogCondition.0 = INTEGER: other(1)
CPQHLTH-MIB::cpqHeCorrMemErrorCntThresh.0 = INTEGER: 5

but that doesn't help me much, since cpqHeCorrMemErrorCntThresh is not writeable. Also, I don't know what it really means: 5 errors in how long? Since the system was booted?
12 REPLIES

Re: How to change correctable memory error threshhold?

Err, that should have read, DL380 G3.
Oleg Koroz
Honored Contributor

Re: How to change correctable memory error threshhold?

I don't think you can change threshold, it's controlled by chipset and fixed already, a small logic chip part of memory module assembly.
CPQHLTH system management driver report correctable memory error, chances to fix the problem is update: System ROM, System management Driver, IM agents, verify memory, or replacement might be suitable resolution. If you take a look in DL380 G3 each DIMM slot has LED beside, see which one light amber when error exceed threshold or IML should report slot #, Swap around insight of single banks or between banks by Pair, replace if problem follow module.

Re: How to change correctable memory error threshhold?

You say that the problem can be fixed by updating:

The system ROM
The system management driver
The IM agents

Does that mean that any of these components, if out of date, can cause spurious reports of bad RAM? Can you provide a cite for this?
Oleg Koroz
Honored Contributor

Re: How to change correctable memory error threshhold?

David Claypool
Honored Contributor

Re: How to change correctable memory error threshhold?

Thanks, but those advisories don't seem to apply to my case.

The first doesn't apply because in all cases we are looking at the IML. That is the only data source that we use to report bad memory.

The second doesn't apply because we're using DL380s, not DL580s.
JohnWRuffo
Honored Contributor

Re: How to change correctable memory error threshhold?

Matthew:

You are not alone in this error state. We have seen this one too and I wondered if the threshhold was accurate...

Thankfully, HP does not blink in replacing the DIMM(s) and thusfar, the error has not returned after the fix.
To me, that says HP knows there may be DIMMs that are not up to spec or they are aware of the state error and we "may" see a bios update? I do not think the G3 will be updated but they may send out an errata on it.
Enjoy!
__________________________________________
Was the post useful? Click on the white KUDOS! Star.

Do you need help with your HP product?
Try this: http://www.hp.com/support/hpgt
David Claypool
Honored Contributor

Re: How to change correctable memory error threshhold?

So you have to manually check each machine to read the IML?

Re: How to change correctable memory error threshhold?

I didn't say `manually'. :-)
David Claypool
Honored Contributor

Re: How to change correctable memory error threshhold?

Okay, since you're making me draw it out of you, how do you do it?

Re: How to change correctable memory error threshhold?

Sorry, I did not mean to be obscure: I have a program that checks the log on all my machines.

Re: How to change correctable memory error threshhold?

John,

We've never had trouble getting bad RAM replaced either. I think it's admirable that HP trusts its hardware to perform the diagnostic, rather than making the customer take additional steps before they will replace the DIMM.

The trouble is that with 600 machines the human time cost of doing these replacements is nontrivial. If we can increase the replacement threshhold without much lowering reliability (measured by, say, the number of unexplained reboots or crashes per machine per time), we'd like to do that.