ProLiant Servers (ML,DL,SL)
1752287 Members
4679 Online
108786 Solutions
New Discussion юеВ

ProLiant ML150 G6 - Memory ECC Uncorrectable ECC

 
Benjamin Houttuin
Occasional Contributor

ProLiant ML150 G6 - Memory ECC Uncorrectable ECC

Guru's,

I really need your help on this...
Since yesterday my system is rebooting, crashing or halting, when I look into the System Event Log of the ILO100, then I see the following:

Generic 12/01/2010 05:58:09 Memory ECC Uncorrectable ECC Assertion
Generic 12/01/2010 05:58:09 Memory ECC Uncorrectable ECC Assertion
Generic 12/01/2010 05:58:10 CPU1 IErr State Asserted Assertion
Generic 12/01/2010 05:58:10 NMI Alert State Asserted Assertion
Generic 12/01/2010 05:58:11 CPU1 IErr State Asserted Deassertion

And later:

Generic 12/01/2010 23:17:47 Memory ECC Uncorrectable ECC Assertion
Generic 12/01/2010 23:17:47 Memory ECC Uncorrectable ECC Assertion
Generic 12/01/2010 23:19:34 POST Error System FW Error Assertion

I guess that 1 of my 12 Ram modules died, but I did an extended memory test using windows memory tester but it didn't gave any errors.

What is going on?
Does HP have a memory check tool that can also identify the DIMM that is dead so that I do not have to test all 12 DIMM's one by one?

All advice is appreciated!

Thanks,

Benjamin
3 REPLIES 3
Benjamin Houttuin
Occasional Contributor

Re: ProLiant ML150 G6 - Memory ECC Uncorrectable ECC

Just a small addition, I now used the HP Server Install DVD that is shipped with the server to boot in to the menu and run the Diagnostic on the Memory... This HP memory diagnostic does find an error it ECC but cannot address which DIMM, it just says that there is an memory error regarding ECC, thats it.
Benjamin Houttuin
Occasional Contributor

Re: ProLiant ML150 G6 - Memory ECC Uncorrectable ECC

I would like to report back that I think I solved the problem...

First of all the bug that is mentioned here:
http://bizsupport1.austin.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c01197980тМй=en&cc=us&taskId=115&prodSeriesId=254931&prodTypeId=15351
... still exists in later versions of HP Insight Diagnostics. Although this HP communication dates from 2003-10-20 it is still active. So if there is a ECC error in the ILO log and you do not clear the log before you start the Memory testing then all DIMM's will fail on the ECC check.

Believe me I checked all 12 DIMMs by hand and it cost me a half a day.

At the end I found out that there where no errors at all and let the Memory test run for a day (loops) After that I ran the Windows memory testing tool for an additional half day (loops) and also no errors found.

The only thing that I could figure out that because its winter and the humidity in the area is dropping it could be "ESD" that caused the issue http://en.wikipedia.org/wiki/Electrostatic_discharge as this is closely related to humidity.
Benjamin Houttuin
Occasional Contributor

Re: ProLiant ML150 G6 - Memory ECC Uncorrectable ECC

see my last post