HPE 9000 and HPE e3000 Servers
1751952 Members
5173 Online
108783 Solutions
New Discussion юеВ

8420 Cpu Problem

 
Paul Coffey_1
Advisor

8420 Cpu Problem

I'm receiving the following errors and I believe the problem is the cpu in slot 1 of cell board 0 but I'm not 100% certain. Can anyone verify for me?

Thanks

Notification Time: Fri Nov 6 14:49:58 2009

cuxddb01 sent Event Monitor notification information:

/system/events/ipmi_fpl/ipmi_fpl is >= 1.
Its current value is SERIOUS(4).



Event data from monitor:

Event Time..........: Fri Nov 6 14:49:58 2009
Severity............: SERIOUS
Monitor.............: fpl_em
Event #.............: 708
System..............: cuxddb01

Summary:
Power fault on cell board


Description of Error:

The local Power Monitor is reporting a fault with the named Cell Power Board.
The data field of this event can be decoded as follows where a bit set in any
of the status fields indicates a fault:
data byte 0: (power_good << 1) | (power_fault)
data byte 1: Cell Power Board converters status
bits[0:2] - memory power bricks 0-2 status
bits[3:5] - power i/f bus bricks 0-2 status
bits[6:7] - JAB core power bricks 0-1 status
data byte 2: Cell converters status
bit 0 - clock power status
bit 1 - cache power status
bit 2 - link power status
bit 3 - CC core power status
bit 4 - FSB power status
bit 5 - 48-V status
data byte 3: CPU module converters status
bit 0 - CPU module 0 core power status
bit 2 - CPU module 1 core power status
bit 4 - CPU module 2 core power status
bit 6 - CPU module 3 core power status

Probable Cause / Recommended Action:


One or more of the DC to DC power converters on the Cell Power Board is
displaying a fault condition.


Contact HP Support personnel to troubleshoot the problem.


Additional Event Data:
System IP Address...: 10.73.4.101
Event Id............: 0x4af47de600000000
Monitor Version.....: A.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_fpl_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/rp8420
EMS Version.....................: A.04.20
STM Version.....................: C.58.00
System Serial Number............: USE4444FVX
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#708

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v


IPMI event hex: 0xab000b5700e00000 0x010000004af43a0d
Time Stamp: Fri Nov 6 15:00:29 2009
Event keyword: CELL_POWER_FAULT
Alert level name: Critical
Reporting vers: 1
Data field type: Timestamp
Decoded data field: Fri Nov 6 15:00:29 2009
Reporting entity ID: 0 ( Cab 0 Cell 0 )
Reporting entity Full Name: PDH Controller
IPMI Event ID : 2903 (0xb57)


>---------- End Event Monitoring Service Event Notification ----------<

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Fri Nov 6 14:49:58 2009

cuxddb01 sent Event Monitor notification information:

/system/events/ipmi_fpl/ipmi_fpl is >= 1.
Its current value is MINORWARNING(2).



Event data from monitor:

Event Time..........: Fri Nov 6 14:49:58 2009
Severity............: MINORWARNING
Monitor.............: fpl_em
Event #.............: 2511
System..............: cuxddb01

Summary:
Event corresponding to PAT encoded chassis codes


Description of Error:

This event is used for translated PAT encoded chassis codes to E0 format. Data
field contains the legacy chassis code.

Probable Cause / Recommended Action:


A PAT encoded chassis code is emitted
The alert level is warning. Contact HP support since system may be degraded.


Additional Event Data:
System IP Address...: 10.73.4.101
Event Id............: 0x4af47de600000003
Monitor Version.....: A.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_fpl_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/rp8420
EMS Version.....................: A.04.20
STM Version.....................: C.58.00
System Serial Number............: USE4444FVX
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#2511

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v


IPMI event hex: 0x78800c6220e00000 0xa0e108c01100b000
Time Stamp: Fri Nov 6 15:00:31 2009
Event keyword: PAT_ENCODED_FIELD_WARNING
Alert level name: Warning
Reporting vers:

Data field type: 1st 8 bytes PDC PAT Chassis code
Decoded data field:
Reporting entity ID: 32 ( Cab 0 Cell 2 CPU 0 )
Reporting entity Full Name: O/S Kernel (Generic)
IPMI Event ID : 3170 (0xc62)


>---------- End Event Monitoring Service Event Notification ----------<

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Fri Nov 6 14:49:58 2009

cuxddb01 sent Event Monitor notification information:

/system/events/ipmi_fpl/ipmi_fpl is >= 1.
Its current value is MINORWARNING(2).



Event data from monitor:

Event Time..........: Fri Nov 6 14:49:58 2009
Severity............: MINORWARNING
Monitor.............: fpl_em
Event #.............: 2515
System..............: cuxddb01

Summary:
Event corresponding to PAT encoded chassis codes' data field


Description of Error:

This event is used for translated PAT encoded chassis codes' data field to E0
format. Data field contains the legacy chassis code's data.

Probable Cause / Recommended Action:


A PAT encoded chassis code is emitted
System may be degraded. Contact HP support if system is not running optimally.


Additional Event Data:
System IP Address...: 10.73.4.101
Event Id............: 0x4af47de600000006
Monitor Version.....: A.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_fpl_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/rp8420
EMS Version.....................: A.04.20
STM Version.....................: C.58.00
System Serial Number............: USE4444FVX
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#2515

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v


IPMI event hex: 0x74800c6820e00000 0x00000000000005e9
Time Stamp: Fri Nov 6 15:00:31 2009
Event keyword: PAT_DATA_FIELD_WARNING
Alert level name: Warning
Reporting vers:

Data field type: Major change in system state
Decoded data field: System State = 9(Panic)
State Change Event = 0(At BIB)
LED Command Valid = 1(LED command field should be used to update the system
LEDs)
LED Run = 2(steady green)
LED Attention = 3(reserved)
LED Stopped = 1(flashing red)
Reporting entity ID: 32 ( Cab 0 Cell 2 CPU 0 )
Reporting entity Full Name: O/S Kernel (Generic)
IPMI Event ID : 3176 (0xc68)


>---------- End Event Monitoring Service Event Notification ----------<

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Fri Nov 6 14:49:58 2009

cuxddb01 sent Event Monitor notification information:

/system/events/ipmi_fpl/ipmi_fpl is >= 1.
Its current value is CRITICAL(5).



Event data from monitor:

Event Time..........: Fri Nov 6 14:49:58 2009
Severity............: CRITICAL
Monitor.............: fpl_em
Event #.............: 1043
System..............: cuxddb01

Summary:
An HPMC has been encountered.


Description of Error:

Each CPU will send this code early in the PDC HPMC handler, as soon as the
cause of the machine check is determined to be HPMC.
The data field contains the interrupt instruction address offset.

Probable Cause / Recommended Action:


HPMC has occurred.


Contact HP Support to analyze the HPMC PIM and Error Logs to determine the
cause of the failure


Additional Event Data:
System IP Address...: 10.73.4.101
Event Id............: 0x4af47de600000009
Monitor Version.....: A.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_fpl_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/rp8420
EMS Version.....................: A.04.20
STM Version.....................: C.58.00
System Serial Number............: USE4444FVX
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#1043

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v


IPMI event hex: 0xe880035c05e00000 0x0000000000134214
Time Stamp: Fri Nov 6 15:00:32 2009
Event keyword: ERR_CHECK_HPMC
Alert level name: Fatal
Reporting vers: 1
Data field type: Code address
Decoded data field:
Reporting entity ID: 5 ( Cab 0 Cell 0 CPU 5 )
Reporting entity Full Name: System Firmware
IPMI Event ID : 860 (0x35c)


>---------- End Event Monitoring Service Event Notification ----------<

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Fri Nov 6 14:50:00 2009

cuxddb01 sent Event Monitor notification information:

/system/events/ipmi_fpl/ipmi_fpl is >= 1.
Its current value is MAJORWARNING(3).



Event data from monitor:

Event Time..........: Fri Nov 6 14:50:00 2009
Severity............: MAJORWARNING
Monitor.............: fpl_em
Event #.............: 2353
System..............: cuxddb01

Summary:
The CPU failed to complete the task for which it was awoken


Description of Error:

The CPU failed to complete the task for which it was awoken.
Data field contains the physical location of the CPU that didn't complete the
task for which it was awoken.

Probable Cause / Recommended Action:


specified CPU executing slowly

Contact HP support to troubleshoot CPU/cell board

Cause2: CPU never got correctly awakened, so could never finish its task
Action2: Contact HP support to see if there is a PDC upgrade for this issue


Additional Event Data:
System IP Address...: 10.73.4.101
Event Id............: 0x4af47de800000000
Monitor Version.....: A.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_fpl_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/rp8420
EMS Version.....................: A.04.20
STM Version.....................: C.58.00
System Serial Number............: USE4444FVX
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#2353

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v


IPMI event hex: 0x64800f4c00e00000 0x00ffff0001ffff15
Time Stamp: Fri Nov 6 15:06:30 2009
Event keyword: BOOT_MONARCH_CHECK_SLAVE_FAILED
Alert level name: Warning
Reporting vers: 1
Data field type: Physical location
Decoded data field: FRU Physical Location: 0x00ffff0001ffff15
FRU Source = 1(processor)
Source Detail = 5(unknown)
Cabinet Location = 0
Cell Location = 0
Reporting entity ID: 0 ( Cab 0 Cell 0 CPU 0 )
Reporting entity Full Name: System Firmware
IPMI Event ID : 3916 (0xf4c)


>---------- End Event Monitoring Service Event Notification ----------<

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Fri Nov 6 14:50:06 2009

cuxddb01 sent Event Monitor notification information:

/system/events/ipmi_fpl/ipmi_fpl is >= 1.
Its current value is MAJORWARNING(3).



Event data from monitor:

Event Time..........: Fri Nov 6 14:50:06 2009
Severity............: MAJORWARNING
Monitor.............: fpl_em
Event #.............: 2233
System..............: cuxddb01

Summary:
A CPU is being stopped and deconfigured.


Description of Error:

A CPU is being stopped and deconfigured. See the previous IPMI events to
determine the reason that the CPU is being deconfigured.

The data field is the physical location of the CPU being deconfigured.

Probable Cause / Recommended Action:


A CPU is being stopped and deconfigured.


See previous IPMI events to determine the reason that the CPU is being
deconfigured.
Contact HP Support personnel to confirm the CPU is functioning properly.


Additional Event Data:
System IP Address...: 10.73.4.101
Event Id............: 0x4af47dee00000000
Monitor Version.....: A.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_fpl_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/rp8420
EMS Version.....................: A.04.20
STM Version.....................: C.58.00
System Serial Number............: USE4444FVX
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#2233

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v


IPMI event hex: 0x6480087500e00000 0x00ffff0001ffff15
Time Stamp: Fri Nov 6 15:06:30 2009
Event keyword: BOOT_CPU_DECONFIG
Alert level name: Warning
Reporting vers: 1
Data field type: Physical location
Decoded data field: FRU Physical Location: 0x00ffff0001ffff15
FRU Source = 1(processor)
Source Detail = 5(unknown)
Cabinet Location = 0
Cell Location = 0
Reporting entity ID: 0 ( Cab 0 Cell 0 CPU 0 )
Reporting entity Full Name: System Firmware
IPMI Event ID : 2165 (0x875)
5 REPLIES 5
Sameer_Nirmal
Honored Contributor

Re: 8420 Cpu Problem

It maybe a CPU issue or as resulting outcome of a VRM issue to which is attached to a CPU. One of the VRMs is dead is sure.

I would check the status of the VRMs and CPU status on all cell boards using

MP>CM>PS>C > ( cell board number )
Paul Coffey_1
Advisor

Re: 8420 Cpu Problem

HW status for Cell 0 : FAILURE DETECTED

Power status : on, CPU 1 VOLTAGE FAULT
Boot is not blocked
PDH memory is shared
Processor Compatibility : OK
RIO cable status : connected
RIO cable connection physical location : PCI Domain 0
Core cell is cell 0
Attention Led is off

PDHC status Leds : ****

CPU Module Slot 0 1 2 3
Populated P P P
Local 48V Good * * *
Power Enabled * *
Power Good * *

(* - True, P - Processor, T - Terminator)



DIMMs populated:
0 . . . 4 . . . 8 . . .12 . . .
* * * * * * * * * * * * * * * *

1 1 1 1 1 1
VRM's 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Present : * * * * * * * * * * * * * * *
Enabled : * * * * * * * * * * * * * * *
Pwr Good : * * * * * * * * * * * * * * *


Front Side Bus Freq. : 200 MHz
CPU Core Freq. : 1000 MHz
CPU Part Number : PA8900

System Boot Rom (SFW) firmware rev 24.004
PDH controller (PDHC) firmware rev 3.031, built THU NOV 09 21:37:37 2006
MICE revision is 1.0
Sameer_Nirmal
Honored Contributor

Re: 8420 Cpu Problem

It looks like that there is CPU 1 Core voltage fault. The CPU 1 which was subsequently stopped and deconfigured by the system firwmare on account of the core voltage fault. Maybe CPU 1 is defective. I am not sure about the status of CPU 1 being as shown because of the posted text formatting. Can you confirm the status of CPU 1 in terms of the "*" and other CPUs as well? But anyway, there is one "*" missing for a CPU in power enabled and good information where the problem is.

CPU Module Slot 0 1 2 3
Populated P P P
Local 48V Good * * *
Power Enabled * *
Power Good * *
Paul Coffey_1
Advisor

Re: 8420 Cpu Problem

Doesn't appear to be the cpu. I changed it out and cpu 1 is still missing when I look at it from bch. In my next maint window I'm changing out the cell board/vrm.
Paul Coffey_1
Advisor

Re: 8420 Cpu Problem

assuming I can figure out the firmware updates.