BladeSystem - General
cancel
Showing results for 
Search instead for 
Did you mean: 

bl460c G6 Uncorrectable Memory Errors

 
Dan González
Occasional Advisor

bl460c G6 Uncorrectable Memory Errors

On a Blade Enclosure c7000 G2 equipped with bl460c G6 servers: some servers suddenly, and on different days within a week, have started to report on IML Log "Corrected Memory Error threshold exceeded" warnings. On one point some of these servers start to show "Uncorrectable Memory Error" messages with different consequences: from booting normally to warm restart while OS booting.

A HP technician has changed warning and erroneous memory modules on a server some hours ago and until now it seems OK, but the issue is reproducing on other servers that are working right until now.

¿Any idea?
16 REPLIES
Cederberg
Honored Contributor

Re: bl460c G6 Uncorrectable Memory Errors

Upgrade to Latest Firmware If the errors continue it's the DIMMs that are faulty.
Cederberg
Honored Contributor

Re: bl460c G6 Uncorrectable Memory Errors

The bios version 2010.01.14 (5 Feb 2010) has a fix for the problem.

"Resolved an issue which could cause correctable memory error threshold events or uncorrectable memory errors. This issue does not indicate a data issue with the DIMM and the System ROM upgrade will completely resolve this issue."
marcus1234
Honored Contributor

Re: bl460c G6 Uncorrectable Memory Errors

update firmware as suggested and then reseat dimms

move dimms around notice is it always the same 2 slots with issues

if so run offline diagnostics with psp software cd version 8.3 run feks 20 cycles..

and check results

points appreciated :

and ensure all dimms are hp branded ;;;
Dan González
Occasional Advisor

Re: bl460c G6 Uncorrectable Memory Errors

Unfortunately firmware update doesn't work. In spite it turns that more memory modules show warning or error messages.

All modules are HP branded and HP service has changed all failing ones on a pair of servers. Then other modules or the same ones fail again after some time.

The situation gets weird when the errors depends on how the encosure powers on. Each power on takes to a different scenario.

So far we've opened a case with HP and they are searching for a solution.

Francesco Novelli
Occasional Advisor

Re: bl460c G6 Uncorrectable Memory Errors

We have the same issue.
We tried applying the lastest firmware update cd and then flash with the latest bios (2010.01.14 (A)) on two blades of the enclosure.
At first it seems everything ok but after one day again memory errors.
Also we have a problem that often at power on, ESX3.5 hangs inizializing the scheduler.

Did you receive a solution?

I remember that before the latest bios update, there was a note from HP saying to modify some power option to avoid the error but now they removed it because they are still saying the problem is fixed...
...but it doesn't seem so...

Dan González
Occasional Advisor

Re: bl460c G6 Uncorrectable Memory Errors

HP made a massive DIMM module change. Currently not all modules have been changed, only 34 of 128 (8 modules x 16 blade servers). We are waiting for the remaining ones.
They told us that a bad memory modules batch has been detected.

Apparently the change resolved the problem, but I'm not really confident about it until time demostrates me so.
gregersenj
Honored Contributor

Re: bl460c G6 Uncorrectable Memory Errors

Do you have an entry in the IML, with the memory error?

If so did you mark it repaired?

If you got a bad memeory module, it is logged in the IML. If you replace the module, and havn't mark it as repaired. Then you will get an event in the evnt log (Windows) stating that you have a mem failure, eache tim you reboot the server.

BR
/jag
Dan González
Occasional Advisor

Re: bl460c G6 Uncorrectable Memory Errors

Francesco, also note that we've got the same problem with ESXi, when a memory error or warning arises on server boot, ESXi hangs exactly where you say, at scheduler init.

gregersenj, the warning and error messages varies from one boot to another. OA only notifies a degraded status on a blade server when a message appeared on IML Log since last boot (This is our case, of course). Could you say me where to mark a memory module as repaired? I really don't know where.
Raybies
Occasional Visitor

Re: bl460c G6 Uncorrectable Memory Errors

Hi Dan,

Did you get any fix for this problem?
We've started to experience the same issue on our bl460c g6's
Dan González
Occasional Advisor

Re: bl460c G6 Uncorrectable Memory Errors

The solution given was to change all memory modules on the sixteen blade servers on the enclosure. This in combination with BIOS, ILO, OA and VC firmware upgrades.

Since then no problem detected.

I think that, at the end, this was a problem with certain memory module batches, at least internally, recognized by HP.
Blade user
Occasional Advisor

Re: bl460c G6 Uncorrectable Memory Errors

Located problem on our Bl460c G6's to faulty batch of RAM as well. Ended up replacing 7 known faulty pieces.
Francesco Novelli
Occasional Advisor

Re: bl460c G6 Uncorrectable Memory Errors

Sorry for the long time elapsded, but due some contractual delay HP help desk started to be involved on this trouble only from 3 weeks.
HP techinicial asked me to update the all fw on a blade, the launch smartstart diagnostic and export the survey.
In the first blade checked there was errors on two dimms, they sent the new memory and after that the server seems ok.
I tried on another blade, same sequence, two dimms bad but when I exited the diagnostic and tried to boot esx, the server hanged in the scheduler step and I had to switch it of manually.
At new boot again system health led blinkng and another dimm displayed by ILO as faulty.
I reported this to the techincian and he sent me 3 dimms to change.
I did the change and after that the server hanged while booting SmartStart.
Again system health led blinkng and one of the new dimm displayed by ILO as faulty plus another one never reported as faulty before.
So I'm very doubtful about this solution, tomorrow a techinician from HP should come here with 2 new dimms.
...I have 24 blades I cannot imagine and end for this trouble :-I

Regards,
Francesco
DerekS_1
Frequent Advisor

Re: bl460c G6 Uncorrectable Memory Errors

Did you install the very latest BIOS, released in the last week? Has a LONG write on the laundry list of DIMM problems it fixes.
Richard Brodie_1
Honored Contributor

Re: bl460c G6 Uncorrectable Memory Errors

Have you got a reference, Derek? I've not found anything relevant.
Ted Steenvoorden
Occasional Visitor

Re: bl460c G6 Uncorrectable Memory Errors

I am also experiencing problems with BL460c G6 blades and 2 GB Hynix DIMM's. These DIMMS are HP branded and also have a label with Hynix (Korea 07) 2Gb 2Rx8 PC3 - 10600E.

The BL460c G6 blades are running fine, but when I power off the blade, remove the blade from the C7000 enclosure and reinsert the blade in a C7000 enclosure the blade won't boot properly. The iLO screen stays black and the C7000 active cooling fan keeps running on full speed. After a while the IML log reports uncorrected memory errors on DIMMS.

Replacing the DIMMS corrects the problem, however this has happened to me with already 5 blades. We are currently reseating blades through multiple C7000 enclosures but I have stopped this project until this matter is clear. All DIMM's with errors are of type Hynix (Korea 07) 2Gb 2Rx8 PC3 - 10600E.

Can you check if you have trouble with the same type of memory DIMMS?
James D. Young
Frequent Advisor

Re: bl460c G6 Uncorrectable Memory Errors

I have been having the same issue but with BL490c G6 blades with HYNX memory installed. HP will have 288 modules to me tomorrow to replace all memory in all 16 of my ESX Servers. I was told by a HP tech many months ago that HP knows it is related to these modules. They did not question me when i asked for all my servers memory to be replaced with non HYNX memory.

I hope this will solve the problem that has been going on most of this year already. We have changed out many modules this year for the same Correctable or Uncorrectable problems.