BladeSystem - General
1753971 Members
7627 Online
108811 Solutions
New Discussion юеВ

Re: bl460c G6 Uncorrectable Memory Errors

 
Dan Gonz├бlez
Occasional Advisor

bl460c G6 Uncorrectable Memory Errors

On a Blade Enclosure c7000 G2 equipped with bl460c G6 servers: some servers suddenly, and on different days within a week, have started to report on IML Log "Corrected Memory Error threshold exceeded" warnings. On one point some of these servers start to show "Uncorrectable Memory Error" messages with different consequences: from booting normally to warm restart while OS booting.

A HP technician has changed warning and erroneous memory modules on a server some hours ago and until now it seems OK, but the issue is reproducing on other servers that are working right until now.

┬┐Any idea?
16 REPLIES 16
Cederberg
Honored Contributor

Re: bl460c G6 Uncorrectable Memory Errors

Upgrade to Latest Firmware If the errors continue it's the DIMMs that are faulty.
Cederberg
Honored Contributor

Re: bl460c G6 Uncorrectable Memory Errors

The bios version 2010.01.14 (5 Feb 2010) has a fix for the problem.

"Resolved an issue which could cause correctable memory error threshold events or uncorrectable memory errors. This issue does not indicate a data issue with the DIMM and the System ROM upgrade will completely resolve this issue."
marcus1234
Honored Contributor

Re: bl460c G6 Uncorrectable Memory Errors

update firmware as suggested and then reseat dimms

move dimms around notice is it always the same 2 slots with issues

if so run offline diagnostics with psp software cd version 8.3 run feks 20 cycles..

and check results

points appreciated :

and ensure all dimms are hp branded ;;;
Dan Gonz├бlez
Occasional Advisor

Re: bl460c G6 Uncorrectable Memory Errors

Unfortunately firmware update doesn't work. In spite it turns that more memory modules show warning or error messages.

All modules are HP branded and HP service has changed all failing ones on a pair of servers. Then other modules or the same ones fail again after some time.

The situation gets weird when the errors depends on how the encosure powers on. Each power on takes to a different scenario.

So far we've opened a case with HP and they are searching for a solution.

Francesco Novelli
Occasional Advisor

Re: bl460c G6 Uncorrectable Memory Errors

We have the same issue.
We tried applying the lastest firmware update cd and then flash with the latest bios (2010.01.14 (A)) on two blades of the enclosure.
At first it seems everything ok but after one day again memory errors.
Also we have a problem that often at power on, ESX3.5 hangs inizializing the scheduler.

Did you receive a solution?

I remember that before the latest bios update, there was a note from HP saying to modify some power option to avoid the error but now they removed it because they are still saying the problem is fixed...
...but it doesn't seem so...

Dan Gonz├бlez
Occasional Advisor

Re: bl460c G6 Uncorrectable Memory Errors

HP made a massive DIMM module change. Currently not all modules have been changed, only 34 of 128 (8 modules x 16 blade servers). We are waiting for the remaining ones.
They told us that a bad memory modules batch has been detected.

Apparently the change resolved the problem, but I'm not really confident about it until time demostrates me so.
gregersenj
Honored Contributor

Re: bl460c G6 Uncorrectable Memory Errors

Do you have an entry in the IML, with the memory error?

If so did you mark it repaired?

If you got a bad memeory module, it is logged in the IML. If you replace the module, and havn't mark it as repaired. Then you will get an event in the evnt log (Windows) stating that you have a mem failure, eache tim you reboot the server.

BR
/jag

Accept or Kudo

Dan Gonz├бlez
Occasional Advisor

Re: bl460c G6 Uncorrectable Memory Errors

Francesco, also note that we've got the same problem with ESXi, when a memory error or warning arises on server boot, ESXi hangs exactly where you say, at scheduler init.

gregersenj, the warning and error messages varies from one boot to another. OA only notifies a degraded status on a blade server when a message appeared on IML Log since last boot (This is our case, of course). Could you say me where to mark a memory module as repaired? I really don't know where.
Raybies
New Member

Re: bl460c G6 Uncorrectable Memory Errors

Hi Dan,

Did you get any fix for this problem?
We've started to experience the same issue on our bl460c g6's