HPE 9000 and HPE e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

Multiple crossbar controller (XBC) to cell controller link errors in long duration

 
SOLVED
Go to solution
moonchild
Regular Advisor

Multiple crossbar controller (XBC) to cell controller link errors in long duration

Hi on a SD32 running 11.11 I get the following STM notification:

Multiple crossbar controller (XBC) to cell controller link errors in long duration

Attached is the full event.

I get that every 12 hours almost. the system is up and running fine.

Any ideas?

Thanks
8 REPLIES 8
Robert_Jewell
Honored Contributor

Re: Multiple crossbar controller (XBC) to cell controller link errors in long duration

I think I recall seeing some XBC link fixes in recent firmware updates. Run SYSREV command from the GSP to see the PDC levels of the cells to see if there is an update to apply.

-Bob
----------------
Was this helpful? Like this post by giving me a thumbs up below!
moonchild
Regular Advisor

Re: Multiple crossbar controller (XBC) to cell controller link errors in long duration

GSP:CM> sysrev
Utility Subsystem FW Revision Level: 7.34

| Cabinet #0 | Cabinet #1 |
-----------------------+-----------------+-----------------+
| PDC | PDHC | PDC | PDHC |
Cell (slot 0) | 36.8 | 7.10 | | |
Cell (slot 1) | 36.8 | 7.10 | | |
Cell (slot 2) | 36.8 | 7.10 | | |
Cell (slot 3) | 36.8 | 7.10 | | |
Cell (slot 4) | 36.8 | 7.10 | 36.8 | 7.10 |
Cell (slot 5) | 36.8 | 7.10 | | |
Cell (slot 6) | 36.8 | 7.10 | 36.8 | 7.10 |
Cell (slot 7) | 36.8 | 7.10 | | |
| | |
GSP | 7.34 | |
CLU | 7.8 | 7.8 |
PM | 7.16 | 7.16 |
CIO (bay 0, chassis 1) | 7.4 | 7.4 |
CIO (bay 0, chassis 3) | | |
CIO (bay 1, chassis 1) | | |
CIO (bay 1, chassis 3) | 7.4 | 7.4 |
Andrew Rutter
Honored Contributor

Re: Multiple crossbar controller (XBC) to cell controller link errors in long duration

hi,

well it seems your upto date with the firmware from the support website.

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=322811&swItem=PF-CSFW0009&prodNameId=322813&swEnvOID=7&swLang=13&taskId=135&mode=4&idx=0

its also worth checking your version of the diagnostics installed? has this just started happening?

it sounds like your experiencing something like event 76 in this link

http://docs.hp.com/en/diag/ems/dm_core_hw.htm

if so you may need to get the backplane checked out

Andy
moonchild
Regular Advisor

Re: Multiple crossbar controller (XBC) to cell controller link errors in long duration

now I am getting

Event data from monitor:

Event Time..........: Tue Apr 8 06:40:27 2008
Severity............: SERIOUS
Monitor.............: dm_core_hw
Event #.............: 81


Summary:
Excessive crossbar controller (XBC) to cell controller link errors in short duration


Description of Error:

The cell controller (CC) chip has detected and corrected an excessive
number of errors in data transferred to it from the crossbar controller
(XBC) to which it is connected during a short time duration.

Probable Cause / Recommended Action:

There may be a problem with the crossbar controller (XBC).
Contact your HP support representative to check the backplane.

Additional Event Data:
System IP Address...: 156.70.108.123
Event Id............: 0x47fb4b9b00000000
Monitor Version.....: B.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_core_hw.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 100
Received within...: 1 day(s)
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/SD32000
OS Version......................: B.11.11
STM Version.....................: A.57.00
EMS Version.....................: A.04.20
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_core_hw.htm#81

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v


FRU Physical Location: 0x01ffff04ffffff93
FRU Source = 0x9 (cell)
Source Detail = 0x3 (coherency controller)
Cabinet Location = 0x1
Cell Location = 0x4

XIN_SEC_MODE..............: 0x00000000000000b0
Link parity error on late 72 bits of data. The link identified in this
event had detected an error, but may not be the cause of it.
Link parity error on early 72 bits of data. The link identified in this
event had detected an error, but may not be the cause of it.
Single bit ECC error.


>---------- End Event Monitoring Service Event Notification ----------<

how can we check the system backplane?
Andrew Rutter
Honored Contributor

Re: Multiple crossbar controller (XBC) to cell controller link errors in long duration

hi,

It sounds like you'll need to get HP in to test it properly and do some tests.

The HP CE will have access to the test software and may be able to isolate the problem

see this recent link and Phils replies about the test software on the management station

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1215538

Andy

Mridul Shrivastava
Honored Contributor

Re: Multiple crossbar controller (XBC) to cell controller link errors in long duration

This could be because of one of the loose connected REO cable, that perticular port bad or system backplane issue.

To identify the exact cause better u log a call with HP they will run togo on fly and may be later they would suggest to bring down the system and run togo status.

If there are excessive errors in short period it is thing to worry abt and system may go down. however in long period u still have time to plan the things.
Time has a wonderful way of weeding out the trivial
Phil uk
Honored Contributor
Solution

Re: Multiple crossbar controller (XBC) to cell controller link errors in long duration

Hi,

My first instict would be that you have a Backplane (or possibly Cell) Problem.

PDC 36.8 is indeed the latest for Legacy Domes.

I would suggest calling HP to run the diags.

As an important note, I have noticed a few docs (the may be many more) on the website that refer to SMS JUST diags - the ones I have seen would appear to all be the OFFLINE version, ie, npars/vpars shutdown. Therefore, DON'T try and use these docs as you WILL cause yourself an unexpected outage !!
...call HP !!

Finally,if a BP XBC is failing/failed, then the machine will re-route the data via other XBC's as each cell is linked to more than one XBC. Nevertheless, better to get it looked at sooner rather than later.

Cheers,
Phil

moonchild
Regular Advisor

Re: Multiple crossbar controller (XBC) to cell controller link errors in long duration

Is it normal that I don't see any relevant logs in the GSP?

Thank you