Integrity Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

Corrected Platform Error

 
Robert_Jewell
Honored Contributor

Corrected Platform Error

The following events have been occurring on a regular basis on an rx8620 server.

----------------------------
Event data from monitor:

Event Time..........: Fri Jan 16 09:54:47 2009
Severity............: MAJORWARNING
Monitor.............: cpe_em
Event #.............: 100211

Summary:
A Corrected Platform Error was reported.


Description of Error:

A platform error was corrected by the firmware/hardware. The error
occurred in the System Bus Adapter of cell (3). More information is
available in the Event Details section of this event.

Probable Cause / Recommended Action:

Contact your HP Support Representative to have the Cell Controller
Interfaces checked.
----------------------------


The server firmware is at the latest release revision and the OS is 11.31.

Interestingly there are no associated events logged to the MP.

We have had the cell board in slot 3 replaced, but the events continue to occur.

Before we continue to chase a red herring by having further hardware replaced(cell 3 is attached to an IOX chassis), is there a chance this is OS related? What is the severity of this issue (should I be worried)?

Anyone else experience these problems in the past that can provide any insight?

Thanks in advance.

-Bob
----------------
Was this helpful? Like this post by giving me a thumbs up below!
3 REPLIES 3
cnb
Honored Contributor

Re: Corrected Platform Error

Hi Bob,

Can you post the detailed event section or the /var/opt/resmon/log/event.log file?

Also does cstm logtool indicate anything awry?

#machinfo -v
# echo "sel dev all;info;wait;il"|/usr/sbin/cstm

#cstm
cstm> ru logtool
logtool> rs
logtool> fl

Also do you have the latest EMS/STM and SFM versions installed? S/B December 2008 HWE0812.

http://docs.hp.com/en/diag/stm/stm_upd.htm#supported

What's odd is that the Event listing indicates a minor warning and the logged event indicates MajorCritical?

http://docs.hp.com/en/diag/sfm/CPE_IndicationProvider.html

Event Number: 100211

WBEM Severity: Minor

Event Summary: A Corrected Platform Error was reported.

Event Description: A platform error was corrected by the firmware/hardware. The error occurred in the System Bus Adapter of cell (!). More information is available in the Event Details section of this event.

Probable Cause: Adapter/Card Error

Event Category: System Interconnect

Event Sub-Category: Unknown

Cause 1: <><> Contact your HP Support Representative to have the Cell Controller Interfaces checked.

Recommended Action 1: Null

Threshold Occurence: -1

We saw a similar issue after a firmware upgrade and discovered that the upgrade didn't update everything and reapplying the update resolved our issue. Verify 5.1 revisons:

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=51400&swItem=pf-54856-1&prodNameId=346370&swEnvOID=54&swLang=13&taskId=135&mode=4&idx=0

Programmable Hardware:
System Backplane GPM : 1.004
System Backplane FM : 1.002
System Backplane OSP : 1.002
PCI-X Backplane LPM : 2.000
PCI-X Backplane HS : 1.000
Core IO : 2.011
Cell LPM : 1.002
Cell PDHC : 1.010
Firmware:
Core IO : A.008.006
Event Dictionary : 1.021
Cell PDHC : A.003.031
Cell SFW : 8.022

Suggest all HBA f/w also be checked for updates.

HTH,
Shinji Teragaito_1
Respected Contributor

Re: Corrected Platform Error

Hi,

Robert's message says he is receiving an EMS events from one of
EMS Hardware Monitors:

| Severity............: MAJORWARNING
| Monitor.............: cpe_em
| Event #.............: 100211

cnb looks at a WBEM events for CPE_IndicationProviderIA:

| Event Number: 100211
| WBEM Severity: Minor
| Event Summary: A Corrected Platform Error was reported.

> What's odd is that the Event listing indicates a minor warning
> and the logged event indicates MajorCritical?

Both persons are looking at the same event from the different
side: from EMS or from SFM.

The root cause of this confusion may be coming from the fact
that the current SFM Admin Guide doesn't mention about the
event severity mapping between EMS events and SFM/WBEM
indications.

Here's the severy mapping SFM Admin Guide should mention:

--------------+----------------------
EMS Severity | SFM/WBEM Severity
--------------+----------------------
CRITCAL | Fatal_NonRecoverable
SERIOUS | Critical
MAJORWARNING | Minor
MINORWARNING | Degraded/Warning
INFORMATION | Information
--------------+----------------------

If Robert would receive a #100211 event at SFM diag mode, the
event with the severity 'Minor' should be coming from
CPE_IndicationProviderIA.

Hope this helps you,

Shinji
Robert_Jewell
Honored Contributor

Re: Corrected Platform Error

Here is the entire message of the events that are being received. These have been occurring about once a day.


>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Thu Jan 8 14:48:51 2009
sent Event Monitor notification information:

/system/events/cpe/cperrors is >= 1.
Its current value is MAJORWARNING(3).



Event data from monitor:

Event Time..........: Thu Jan 8 14:48:51 2009
Severity............: MAJORWARNING
Monitor.............: cpe_em
Event #.............: 100211
System..............:

Summary:
A Corrected Platform Error was reported.


Description of Error:

A platform error was corrected by the firmware/hardware. The error
occurred in the System Bus Adapter of cell (3). More information is
available in the Event Details section of this event.

Probable Cause / Recommended Action:

Contact your HP Support Representative to have the Cell Controller
Interfaces checked.

Additional Event Data:
System IP Address...: 192.168.1.24
Event Id............: 0x496666b300000000
Monitor Version.....: B.01.00
Event Class.........: CPE
Client Configuration File...........:
/var/stm/config/tools/monitor/default_cpe_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 3
Received within...: 1 day(s)
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: ia64 hp server rx8620
EMS Version.....................: A.04.20
STM Version.....................: D.03.00
OS Version......................: B.11.31
System Serial Number............: USE
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/cpe_em.htm#100211

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v



Error Details:

Error Recovery Info : 0x81

Corrected Platform Error (Concorde) Record:

Validation Bits: 0x0000000000000040 Error Status: Not valid
Requestor Id: Not valid Responder Id: Not valid
Target Id: Not valid Bus Data: Not valid
OEM Component Id:Not valid Not valid

Concorde Data:

Cell Number: 0x0000000000000003


RIN
===
Primary Mode: 0x0000000000000008 Secondary Mode: 0x0000000000000008
Err Enable Mask: 0x000000001ffeefc9 SGL ECC WireLog:0x0000000004000008
CECC data 0 MSB: 0x0000000000008000 CECC data 0 LSB:0xefffc48000300806
CECC data 1 MSB: 0x0000000000008000 CECC data 1 LSB:0xefffc48000300806


=============================================================================
Explanation(s):

Error Recovery Info : 0x81
* Error has been corrected

RIN_PRI_MODE : 0x0000000000000008



>---------- End Event Monitoring Service Event Notification ----------<

Log tool and machinfo all look good.

The server firmware is fully updated, but the EMS/STM release is not at the latest (currently at D.03.00).

I can work to try and get the update applied to see if that has an effect. My thought on that though is since that all alerts are being received with reference to cell 3, perhaps this is in fact a hardware issue.

-bob

----------------
Was this helpful? Like this post by giving me a thumbs up below!