ProLiant Servers (ML,DL,SL)
1753844 Members
7857 Online
108806 Solutions
New Discussion юеВ

DL380:DIMM's failure and server rebooting

 
Boris Demin
Advisor

DL380:DIMM's failure and server rebooting

Hello,
i'm facing a problem which seems to be related to memory modules failure on HP Proliant DL380 G3 netserver. First of all, this problem results in server rebooting which usually happens in evening time. I tested the server with SmartStart CD and found out that the server can't pass total memory test. There are 2 tests: Noise test and Chache test. Status for both tests is Failed.

This is quotation from DiagTicket i got after testing:
"This test failed as a result of the ECC (error correcting code) reporting an error while the test was operating. While not a
problem with the test itself, this indicates that there was an ECC error incident while the test was running. Check the IML
for any ECC Threshold Passed events. The appropriate DIMM will be noted in the error message itself."

The test results point out to correctable errors in memory modules. But... i replaced old DIMMs with new ones and nothing changed. I did it twice with 2 different pairs of new DIMMs and result was the same. Finally i replaced the system board of the server with new one and ran the test from SmartStart CD over again. The result was quite different. The server passed Noise test and Chache test perfectly. But after a few hours the server rebooted again on its own. The test from SmartStart CD was run again on the server and it again failed Noise and Chache memory tests. Since then the situation is the same. The problem seems to be a hardware problem because it doesn't depend on the software installed.

What can possibly be the root of the problem? (I have 2 DL380 servers with same problem.)
9 REPLIES 9
Sunil Jerath
Honored Contributor

Re: DL380:DIMM's failure and server rebooting

Hello Boris,
When was the last time did you update the system bios and the support pack. If not then you would need to do the updates and see if that corrects the behavior. Here is the link in case you need it:

http://h18007.www1.hp.com/support/files/server/us/locate/20_4706.html

Regards,
Boris Demin
Advisor

Re: DL380:DIMM's failure and server rebooting

Hello,
I have reflashed BIOS (P29 of 10/31/2003) and installed Proliant Support Pack for Windows 2000 (ver.7.00 of 17 Dec.03). Unfortunately i haven't noticed any positive change in the behaviour of my server. I run test program from SmartStart CD and the server fails all memory tests available. Is there any other way to fix the memory problem?
Sunil Jerath
Honored Contributor

Re: DL380:DIMM's failure and server rebooting

Hello Boris,
Which version of SmartStart are you using to run the Diags??

Regards,
Boris Demin
Advisor

Re: DL380:DIMM's failure and server rebooting

Hello,
I have HP SmartSrat CD release 6.10

Regards,
Sunil Jerath
Honored Contributor

Re: DL380:DIMM's failure and server rebooting

Hello Boris,
Please use the following link to run the diags and let me know the results:

http://h18007.www1.hp.com/support/files/server/us/download/14413.html

Regards,
Brian_Murdoch
Honored Contributor

Re: DL380:DIMM's failure and server rebooting

Hi Boris,

You must clear any existing memory errors from the IML BEFORE running tests using the online or offline diagnostics. It appears that you do have genuine memory errors however if a memory error is logged in the IML and it is not cleared, the diagnostics will fail when you run them. This issue is being investigated in Smartstart 6.30 and 6.40. The fact that you swapped the system board and had no errors is because the IML information is held on the system board. The new board would not have had any ECC errors reported in the IML before you ran the diagnostics, hence it passed. Here is a section of the customer advisory on the ECC diagnostic errors.

SCOPE
Any ProLiant server running memory tests using HP Insight Diagnostics version 3.00 or version 4.00 (on SmartStart 6.30 or 6.40).
RESOLUTION
Clear any ECC events out of the IML before executing any memory tests.

IMPORTANT: ECC events are not monitored for any devices that have already encountered an ECC for each boot process (i.e., if a memory DIMM encounters an ECC event during boot and the user clears this ECC event, then Insight Diagnostics memory tests for this DIMM will not detect any more ECC events).

Do you have a record of these entries in the IML, rather than the diagnostic test results?

Please copy the entries from the IML if possible. Can you also physically check the memory dimms if possible and report their assembly numbers from the sticky labels, just in case they are incorrect types for the G3.

Since the system will still be on warranty it may be a good idea to have HP check the dimms for you if you prefer to have the part numbers validated.

I hope this helps,

Brian
Boris Demin
Advisor

Re: DL380:DIMM's failure and server rebooting

Hello,
I tried to run the checkup program you recomended but i had some unexpected problems with running it. I chose Computer Checkup (TEST) option and on the next screen i got the following message: Warning: This computer is noot supported by this version of the program.
I have one more question. How can i clear the Integrated Management log (IML)?

Regards,
Brian_Murdoch
Honored Contributor

Re: DL380:DIMM's failure and server rebooting

To clear the IML from Windows.

Start-Programs-Compaq System Tools-
Compaq Integrated Management Log Viewer
To clear the entire log choose LOG -Clear all Entries.

Regards,

Brian
Boris Demin
Advisor

Re: DL380:DIMM's failure and server rebooting

hi,
Thank you, Brian. Your advice seems to be the instrument to settle my memory problem. The error message (after SmartStart test) dissapeared. The server still may need some testing to see if the error reoccur, but it is clear even now that had i cleared the IML before i wouldn't have those troubles with same error reoccurence. Thank you once again.

Rergards,
Boris