HPE 9000 and HPE e3000 Servers
1748181 Members
3865 Online
108759 Solutions
New Discussion юеВ

Re: HP 9000/800/rp3440: How to identify the cpu failed through Event Log Viewer messages

 
SOLVED
Go to solution
CristianB
Occasional Visitor

HP 9000/800/rp3440: How to identify the cpu failed through Event Log Viewer messages

Hello Everyone,

I'm trying to identify the possible cpu failed with this kind of messages (ERR_CPU_CHECK_SUMMARY):

#  Location|Alert| Encoded Field    |  Data Field    |   Keyword / Timestamp
-------------------------------------------------------------------------------
178   SFW  3   2  0x43800FF203E00DC0 00000000CE71225D MC_OS_TOC_CHECKSUM_ERR
                                                      02 Aug 2018 09:28:40
177   SFW  2   2  0x43800FF202E00DA0 00000000CE71225D MC_OS_TOC_CHECKSUM_ERR
                                                      02 Aug 2018 09:28:40
176   SFW  0  *3  0x60800FF500E00D80 0000000000000000 BOOT_NO_CONS_FOUND
                                                      02 Aug 2018 09:24:00
175   SFW  0  *3  0x7680101200E00D60 FFFFFFFFFFFFFFFF BOOT_NO_GO_SS_CONS
                                                      02 Aug 2018 09:23:59
174   SFW  3   2  0x57800F7303E00D40 2000000000000000 ERR_CPU_CHECK_SUMMARY
                                                      02 Aug 2018 09:21:28
173   SFW  1   2  0x57800F7301E00D20 2000000000000000 ERR_CPU_CHECK_SUMMARY
                                                      02 Aug 2018 09:21:28
172   SFW  2   2  0x57800F7302E00D00 2000000000000000 ERR_CPU_CHECK_SUMMARY
                                                      02 Aug 2018 09:21:28
171   OS   0  *3  0x76800C6800E00CE0 00000000000005E9 PAT_DATA_FIELD_WARNING
                                                      02 Aug 2018 09:21:27
170   OS   0  *3  0x78800C6200E00CC0 A0E008C01100B000 PAT_ENCODED_FIELD_WARNING
                                                      02 Aug 2018 09:21:27

It seeems this kind of messages has benn seen several times in the past as well in this server previous a reboot as follows:

 

115   SFW  0  *3  0x7680101200E008D0 FFFFFFFFFFFFFFFF BOOT_NO_GO_SS_CONS

                                                      28 Jun 2018 14:29:21

114   BMC      2  0x205B34F05A0208C0 FFFF027000120300 Type-02 127002 1208322

                                                      28 Jun 2018 14:27:38

113   SFW  0  *3  0x60800FF500E008A0 0000000000000000 BOOT_NO_CONS_FOUND

                                                      28 Jun 2018 14:22:54

112   SFW  0  *3  0x7680101200E00880 FFFFFFFFFFFFFFFF BOOT_NO_GO_SS_CONS

                                                      28 Jun 2018 14:22:53

111   SFW  3   2  0x57800F7303E00860 2000000000000000 ERR_CPU_CHECK_SUMMARY

                                                      28 Jun 2018 14:20:21

110   SFW  2   2  0x57800F7302E00840 2000000000000000 ERR_CPU_CHECK_SUMMARY

                                                      28 Jun 2018 14:20:21

109   SFW  1   2  0x57800F7301E00820 2000000000000000 ERR_CPU_CHECK_SUMMARY

                                                      28 Jun 2018 14:20:21

108   OS   0  *3  0x76800C6800E00800 00000000000005E9 PAT_DATA_FIELD_WARNING

                                                      28 Jun 2018 14:20:21

107   OS   0  *3  0x78800C6200E007E0 A0E008C01100B000 PAT_ENCODED_FIELD_WARNING

                                                      28 Jun 2018 14:20:21
53    SFW  1   2  0x57800F7301E003A0 2000000000000000 ERR_CPU_CHECK_SUMMARY
                                                      08 May 2018 15:08:30
54    SFW  2   2  0x57800F7302E003C0 2000000000000000 ERR_CPU_CHECK_SUMMARY
                                                      08 May 2018 15:08:30
55    SFW  3   2  0x57800F7303E003E0 2000000000000000 ERR_CPU_CHECK_SUMMARY
                                                      08 May 2018 15:08:30

so we are thinking there is a possible cpu failed but How should I identify the cpu to replace?
This is the current "ioscan -fnC processor" command output:

ononmos7,sys,root # ioscan -fnC processor
Class       I  H/W Path  Driver    S/W State H/W Type  Description
===================================================================
processor   0  128       processor CLAIMED   PROCESSOR Processor
processor   1  129       processor CLAIMED   PROCESSOR Processor
processor   2  152       processor CLAIMED   PROCESSOR Processor
processor   3  153       processor CLAIMED   PROCESSOR Processor
ononmos7,sys,root #



Thanks in advance!
Regards,


Cristian

4 REPLIES 4
Robert_Jewell
Honored Contributor

Re: HP 9000/800/rp3440: How to identify the cpu failed through Event Log Viewer messages

From these messages:

176   SFW  0  *3  0x60800FF500E00D80 0000000000000000 BOOT_NO_CONS_FOUND
                                                      02 Aug 2018 09:24:00
175   SFW  0  *3  0x7680101200E00D60 FFFFFFFFFFFFFFFF BOOT_NO_GO_SS_CONS
                                                      02 Aug 2018 09:23:59

 

...it appears your console path is not setup or something is corrupt in the NVRAM.

 

From the BCH Main Menu run the PA command and look at the Console Path.  It should be something like 0/0/1/0.0  If it is not try setting it with the same PA command.  Better yet, try restoring defaults to the NVRAM with the DEFAULT command found in the CONFIGURATION menu.

Clear the logs and then reboot to try again to see if the errors continue.

 

-Bob

----------------
Was this helpful? Like this post by giving me a thumbs up below!
CristianB
Occasional Visitor

Re: HP 9000/800/rp3440: How to identify the cpu failed through Event Log Viewer messages

Hi Bob,

first of all, thank you very much for your reply, I'll review all advises you mentioned in the previous post related to mesage "BOOT_NO_CONS_FOUND" but I would like to check this message "ERR_CPU_CHECK_SUMMARY" as well, I mean:

-- Is this kind of message (ERR_CPU_CHECK_SUMMARY) anything to be considered as a CPU failed? Should I replace a CPU if I can see this message several times?

-- How could I identify the exact cpu related to this kind of message?

Thanks in advance!
Regards.

Cristian

Robert_Jewell
Honored Contributor
Solution

Re: HP 9000/800/rp3440: How to identify the cpu failed through Event Log Viewer messages

No, that event code alone does not indicate specifically a CPU issue - or anything really.  The check summary event means that the system handled a machine check.  You get the CPU Check Summary multiple times as each CPU reports its status during the processing of a machine check.

A machine check is basically a process where the system stops all processing to handle an error event that it cannot correct.  In your case here, that event is occuring during system POST and looks to be due to the console issue that I mentioned.  Essentially, the system cannot determine a console path, so it is unable to continue operation and hence the machine check routine is called.  

 

-Bob

 

----------------
Was this helpful? Like this post by giving me a thumbs up below!
CristianB
Occasional Visitor

Re: HP 9000/800/rp3440: How to identify the cpu failed through Event Log Viewer messages

Hi Bob,

thank you so much for your explanations.
Regards,


Cristian