ProLiant Servers (ML,DL,SL)
1752808 Members
6304 Online
108789 Solutions
New Discussion юеВ

Re: ProLiant DL385 G7

 
Samuel Domb1
Occasional Contributor

ProLiant DL385 G7

Hi there,

 

I need help on the following issue:

 

Prolian server ProLiant DL385 G7 with vmware esxi 5.

suddenly the machine has a reset.

 

Attached the log.

 

Thanks for the help best regards.

 

Samuel Domb

2 REPLIES 2
Johan Guldmyr
Honored Contributor

Re: ProLiant DL385 G7

Hi,

is the server back online again without any visible issues?

For those who don't want to open an .xlsx file on the Internet the errors in the file are first these two:

473 Critical CPU 03/22/2014 23:44 03/22/2014 23:44 1 Uncorrectable Machine Check Exception (Board 0, Processor 2, APIC ID 0x00000021, Bank 0x00000004, Status 0xF2000080'00020C0F, Address 0x00000000'00000000, Misc 0x00000000'00000000)

472 Critical CPU 03/22/2014 23:44 03/22/2014 23:44 1 Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000010, Bank 0x00000004, Status 0xF2000000'00070F0F, Address 0x00000000'00000000, Misc 0x00000000'00000000)

then lots of these:

893 Critical System Error 03/22/2014 23:48 03/22/2014 23:48 1 An Unrecoverable System Error (NMI) has occurred (System error code 0x00000032, 0x2A400824)

,

I would:
1: upgrade firmwares if you're running old ones
2: run some diagnostics that's available on the SPP as well

Read this advisory about uncorrectable memory parity errors: http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/kb/docDisplay/?spf_p.tpst=kbDocDisplay&spf_p.prp_kbDocDisplay=wsrp-navigationalState%3DdocId%253Demr_na-c03250482
Server-Support
Super Advisor

Re: ProLiant DL385 G7

Yes, I am experiencing the same issue as @Samuel Domb1

So what was the fix for this case ? The Blade server has been running for more than 2 years with no reboot at all.

Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000010, Bank 0x00000004, Status 0xF2000000'00070F0F, Address 0x00000000'00000000, Misc 0x00000000'00000000)

Uncorrectable Chipset Error (Error status 1 0x0018C154, Error status 2 0x00244000)

Uncorrectable Chipset Error (Error status 1 0x0018C160, Error status 2 0x00002040)

Uncorrectable Chipset Error (Error status 1 0x0018C16C, Error status 2 0x20000080)

Uncorrectable Chipset Error (Error status 1 0x0018C170, Error status 2 0x040406FF)

Uncorrectable Chipset Error (Error status 1 0x0018C174, Error status 2 0x00000003)

Uncorrectable Chipset Error (Error status 1 0x0018C178, Error status 2 0x9452EA00)

 My BIOS is A19 12/08/2012 but according to http://h20565.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c03250482 The System ROM dated 12.31.2011 corrects this issue which is older ?

 

Best regards,