Integrity Servers
1748151 Members
3536 Online
108758 Solutions
New Discussion юеВ

I need your help: hp Integrity Superdome SX2000 SD64

 
SOLVED
Go to solution
Alberto_R
Occasional Advisor

I need your help: hp Integrity Superdome SX2000 SD64

Hi everyone.
I need help with this Superdome.
A few days ago, a client reported an error of this machine, indicating that one of the partitions did not start.
I requested a SEL and he reported the following:

MPFSDIA64_1 - ESBCE.CP1.BH14 - DEH49266WJ
=========================================


(c) Copyright 2008 Hewlett-Packard Development Company, L.P.

Welcome to the
Management Processor
HP 9000 and Integrity Superdome Server SD64B


Supported firmware-updateable entity combination: Recipe 8.8e

 

MP MAIN MENU:

CO: Consoles
VFP: Virtual Front Panel
CM: Command Menu
CL: Console Logs
SL: Show Logs
FW: Firmware Update
HE: Help
X: Exit Connection

[SPi3] MP> sl

EVENT LOG MENU:

FPL: Forward Progress Log
SEL: System Event Log
LIVE: Live Events
MPEL: MP Event Log

CLR: Clear FPL and SEL
Q: Quit

[SPi3] MP:VW> sel


Welcome to the System Event Log (SEL) Viewer

The following SEL navigation commands are available:
Dump log starting at current block for capture and analysis
F: Display first (oldest) block
L: Display last (newest) block
J: Jump to specified entry and display previous block
+: Display next (forward in time) block
-: Display previous (backward in time) block
<cr>: Repeat previous +/- command
<sp>: Repeat previous +/- command
?: Display help
<Ctrl-b>: Exit viewer

The following event format options are available:
K: Keyword
R: Raw hex
T: Text

The following event filter options are available:
A: Alert level
C: Cell
U: Unfiltered
MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,<Ctrl-b>) > t
Switching to text format.
MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,<Ctrl-b>) > a

Alert Level Filter:
0: Minor Forward Progress
1: Major Forward Progress
2: Informational
3: Warning
5: Critical
7: Fatal
Q: Quit

For example, selecting an alert level threshold of 3
selects all events with alert levels of 3 or higher.

Please select alert level threshold: 2
Switching to alert level 2 filter.
MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,<Ctrl-b>) >
Log Entry 94531: 01/03/2020 13:45:04
Alert level 3: Warning
Keyword: ARF_XBC_PORT_FE
An FE was found on an XBC port during an ARF traversability test.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000c00e00000000
0x6380150980e033b1 0x0000c00e00000000
0x6b00150980e033b2 0x010000005e0f4560

Log Entry 94530: 01/03/2020 13:45:04
Alert level 2: Informational
Keyword: ARF_REROUTING_PORT_TO_CELL
Indicates that a bad route is being routed around
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0f00f00a00000000
0x43801c1580e033af 0x0f00f00a00000000
0x4b001c1580e033b0 0x010000005e0f4560

 

 


MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,<Ctrl-b>) > d
Log Entry 94499: 01/03/2020 13:45:03
Alert level 2: Informational
Keyword: ARF_FOUND_XBC_EDGE_LST_B_VTX2
fabric found a differencing graph edges while checking for route arounds
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000c00d00000000
0x438017b080e03371 0x0000c00d00000000
0x4b0017b080e03372 0x010000005e0f455f

Log Entry 94500: 01/03/2020 13:45:03
Alert level 2: Informational
Keyword: ARF_P1_DO_ROUTE_AROUNDS
Forward progress indicating fabric route arounds are to be performed
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000100000000003
0x43801a7980e03373 0x0000100000000003
0x4b001a7980e03374 0x010000005e0f455f

Log Entry 94501: 01/03/2020 13:45:03
Alert level 2: Informational
Keyword: ARF_REROUTING_PORT_TO_CELL
Indicates that a bad route is being routed around
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0c00f00900000000
0x43801c1580e03375 0x0c00f00900000000
0x4b001c1580e03376 0x010000005e0f455f

Log Entry 94502: 01/03/2020 13:45:03
Alert level 3: Warning
Keyword: ARF_XBC_PORT_FE
An FE was found on an XBC port during an ARF traversability test.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000800d00000000
0x6380150980e03377 0x0000800d00000000
0x6b00150980e03378 0x010000005e0f455f

Log Entry 94503: 01/03/2020 13:45:03
Alert level 3: Warning
Keyword: ARF_XBC_PORT_FE
An FE was found on an XBC port during an ARF traversability test.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000800d00000000
0x6380150980e03379 0x0000800d00000000
0x6b00150980e0337a 0x010000005e0f455f

Log Entry 94504: 01/03/2020 13:45:03
Alert level 3: Warning
Keyword: ARF_CSR_ROUTE_XBC_LINK_BAD
An XBC link was found to be unexpectedly not connected.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000800900000000
0x6380150c80e0337b 0x0000800900000000
0x6b00150c80e0337c 0x010000005e0f455f

Log Entry 94505: 01/03/2020 13:45:03
Alert level 2: Informational
Keyword: ARF_REROUTING_PORT_TO_CELL
Indicates that a bad route is being routed around
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0d00f00900000000
0x43801c1580e0337d 0x0d00f00900000000
0x4b001c1580e0337e 0x010000005e0f455f

Log Entry 94506: 01/03/2020 13:45:03
Alert level 3: Warning
Keyword: ARF_XBC_PORT_FE
An FE was found on an XBC port during an ARF traversability test.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000800d00000000
0x6380150980e0337f 0x0000800d00000000
0x6b00150980e03380 0x010000005e0f455f

Log Entry 94507: 01/03/2020 13:45:03
Alert level 2: Informational
Keyword: ARF_REROUTING_PORT_TO_CELL
Indicates that a bad route is being routed around
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0e00f00900000000
0x43801c1580e03381 0x0e00f00900000000
0x4b001c1580e03382 0x010000005e0f455f

Log Entry 94508: 01/03/2020 13:45:03
Alert level 3: Warning
Keyword: ARF_XBC_PORT_FE
An FE was found on an XBC port during an ARF traversability test.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000c00d00000000
0x6380150980e03383 0x0000c00d00000000
0x6b00150980e03384 0x010000005e0f455f

Log Entry 94509: 01/03/2020 13:45:03
Alert level 2: Informational
Keyword: ARF_REROUTING_PORT_TO_CELL
Indicates that a bad route is being routed around
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0f00f00900000000
0x43801c1580e03385 0x0f00f00900000000
0x4b001c1580e03386 0x010000005e0f455f

Log Entry 94510: 01/03/2020 13:45:03
Alert level 3: Warning
Keyword: ARF_XBC_PORT_FE
An FE was found on an XBC port during an ARF traversability test.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000c00d00000000
0x6380150980e03387 0x0000c00d00000000
0x6b00150980e03388 0x010000005e0f455f

Log Entry 94511: 01/03/2020 13:45:04
Alert level 3: Warning
Keyword: ARF_CHECK_XBC_FE_ERR_PRESENT
Fabric errors are present on a cell's link to the fabric.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000d00e00000001
0x63801bf880e03389 0x0000d00e00000001
0x6b001bf880e0338a 0x010000005e0f4560

Log Entry 94512: 01/03/2020 13:45:04
Alert level 2: Informational
Keyword: ARF_DELETE_EDGE_PORT_NOT_FOUND
Fabric firmware experienced a problem while modifying its graph data structures.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000d00e00000000
0x438019df80e0338b 0x0000d00e00000000
0x4b0019df80e0338c 0x010000005e0f4560

Log Entry 94513: 01/03/2020 13:45:04
Alert level 3: Warning
Keyword: ARF_CHECK_XBC_FE_ERR_PRESENT
Fabric errors are present on a cell's link to the fabric.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000d00e00000001
0x63801bf880e0338d 0x0000d00e00000001
0x6b001bf880e0338e 0x010000005e0f4560

Log Entry 94514: 01/03/2020 13:45:04
Alert level 3: Warning
Keyword: ARF_CHECK_XBC_FE_ERR_PRESENT
Fabric errors are present on a cell's link to the fabric.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000c00e00000001
0x63801bf880e0338f 0x0000c00e00000001
0x6b001bf880e03390 0x010000005e0f4560

Log Entry 94515: 01/03/2020 13:45:04
Alert level 2: Informational
Keyword: ARF_DELETE_EDGE_PORT_NOT_FOUND
Fabric firmware experienced a problem while modifying its graph data structures.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000c00e00000000
0x438019df80e03391 0x0000c00e00000000
0x4b0019df80e03392 0x010000005e0f4560

Log Entry 94516: 01/03/2020 13:45:04
Alert level 3: Warning
Keyword: ARF_CHECK_XBC_FE_ERR_PRESENT
Fabric errors are present on a cell's link to the fabric.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000c00e00000001
0x63801bf880e03393 0x0000c00e00000001
0x6b001bf880e03394 0x010000005e0f4560

Log Entry 94517: 01/03/2020 13:45:04
Alert level 2: Informational
Keyword: ARF_FOUND_XBC_EDGE_LST_B_VTX1
fabric found a differencing graph edges while checking for route arounds
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000c00a00000000
0x438017af80e03395 0x0000c00a00000000
0x4b0017af80e03396 0x010000005e0f4560

Log Entry 94518: 01/03/2020 13:45:04
Alert level 2: Informational
Keyword: ARF_FOUND_XBC_EDGE_LST_B_VTX2
fabric found a differencing graph edges while checking for route arounds
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000d00e00000000
0x438017b080e03397 0x0000d00e00000000
0x4b0017b080e03398 0x010000005e0f4560

Log Entry 94519: 01/03/2020 13:45:04
Alert level 2: Informational
Keyword: ARF_FOUND_XBC_EDGE_LST_B_VTX1
fabric found a differencing graph edges while checking for route arounds
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000d00a00000000
0x438017af80e03399 0x0000d00a00000000
0x4b0017af80e0339a 0x010000005e0f4560

Log Entry 94520: 01/03/2020 13:45:04
Alert level 2: Informational
Keyword: ARF_FOUND_XBC_EDGE_LST_B_VTX2
fabric found a differencing graph edges while checking for route arounds
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000c00e00000000
0x438017b080e0339b 0x0000c00e00000000
0x4b0017b080e0339c 0x010000005e0f4560

Log Entry 94521: 01/03/2020 13:45:04
Alert level 2: Informational
Keyword: ARF_P1_DO_ROUTE_AROUNDS
Forward progress indicating fabric route arounds are to be performed
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000200000000003
0x43801a7980e0339d 0x0000200000000003
0x4b001a7980e0339e 0x010000005e0f4560

Log Entry 94522: 01/03/2020 13:45:04
Alert level 2: Informational
Keyword: ARF_REROUTING_PORT_TO_CELL
Indicates that a bad route is being routed around
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0c00f00a00000000
0x43801c1580e0339f 0x0c00f00a00000000
0x4b001c1580e033a0 0x010000005e0f4560

Log Entry 94523: 01/03/2020 13:45:04
Alert level 3: Warning
Keyword: ARF_XBC_PORT_FE
An FE was found on an XBC port during an ARF traversability test.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000d00e00000000
0x6380150980e033a1 0x0000d00e00000000
0x6b00150980e033a2 0x010000005e0f4560

Log Entry 94524: 01/03/2020 13:45:04
Alert level 2: Informational
Keyword: ARF_REROUTING_PORT_TO_CELL
Indicates that a bad route is being routed around
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0d00f00a00000000
0x43801c1580e033a3 0x0d00f00a00000000
0x4b001c1580e033a4 0x010000005e0f4560

Log Entry 94525: 01/03/2020 13:45:04
Alert level 3: Warning
Keyword: ARF_XBC_PORT_FE
An FE was found on an XBC port during an ARF traversability test.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000d00e00000000
0x6380150980e033a5 0x0000d00e00000000
0x6b00150980e033a6 0x010000005e0f4560

Log Entry 94526: 01/03/2020 13:45:04
Alert level 3: Warning
Keyword: ARF_XBC_PORT_FE
An FE was found on an XBC port during an ARF traversability test.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000c00e00000000
0x6380150980e033a7 0x0000c00e00000000
0x6b00150980e033a8 0x010000005e0f4560

Log Entry 94527: 01/03/2020 13:45:04
Alert level 3: Warning
Keyword: ARF_CSR_ROUTE_XBC_LINK_BAD
An XBC link was found to be unexpectedly not connected.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000d00a00000000
0x6380150c80e033a9 0x0000d00a00000000
0x6b00150c80e033aa 0x010000005e0f4560

Log Entry 94528: 01/03/2020 13:45:04
Alert level 2: Informational
Keyword: ARF_REROUTING_PORT_TO_CELL
Indicates that a bad route is being routed around
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0e00f00a00000000
0x43801c1580e033ab 0x0e00f00a00000000
0x4b001c1580e033ac 0x010000005e0f4560

Log Entry 94529: 01/03/2020 13:45:04
Alert level 3: Warning
Keyword: ARF_XBC_PORT_FE
An FE was found on an XBC port during an ARF traversability test.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000c00e00000000
0x6380150980e033ad 0x0000c00e00000000
0x6b00150980e033ae 0x010000005e0f4560

Log Entry 94530: 01/03/2020 13:45:04
Alert level 2: Informational
Keyword: ARF_REROUTING_PORT_TO_CELL
Indicates that a bad route is being routed around
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0f00f00a00000000
0x43801c1580e033af 0x0f00f00a00000000
0x4b001c1580e033b0 0x010000005e0f4560

Log Entry 94531: 01/03/2020 13:45:04
Alert level 3: Warning
Keyword: ARF_XBC_PORT_FE
An FE was found on an XBC port during an ARF traversability test.
Reporting Entity: System Firmware located in cabinet 1, slot 0, cpu 0
Actual Data: 0x0000c00e00000000
0x6380150980e033b1 0x0000c00e00000000
0x6b00150980e033b2 0x010000005e0f4560

*********************** End of log ***********************


MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,<Ctrl-b>) >

You can see that there are several alarms with WARNING, but I cannot identify if it is a hardware error or is of another type.

Please, all help will be appreciated.
Greetings.

 

4 REPLIES 4
jsm6
HPE Pro

Re: I need your help: hp Integrity Superdome SX2000 SD64

Hi,

Decoding the events,

ARF_CSR_ROUTE_XBC_LINK_BAD

0x6380150c80e033a9 0x0000d00a00000000
XBC : 0xa - XBC 82
Logical Port number : 0xd - Port 5

ARF_XBC_PORT_FE
0x6380150980e0337f 0x0000800d00000000
Physical Port number : 5

Looks to be communication issue between Cells or Cells are in fault state.

We would need to review complete MP logs (CP,IO,PS,SEL(dump),FPL (dump)) to find out the failure.

If there is a contract with HPE, please log a support ticket with MP logs (above) to isolate the Hardware failure.

Thanks and Regards.

I am a HPE Employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo

Alberto_R
Occasional Advisor

Re: I need your help: hp Integrity Superdome SX2000 SD64

Hi jsm6.
Sorry for answering, but I've been doing some tests with the client about the Superdome.
I have some records and some doubt to ask you, and (if it seems correct) you can resolve me.
We are clear that there is a link failure between the cells, but I'm still not sure if it is a hardware or software failure.
I asked the client for different records:
CP, IO, PS (cabins and cells), SEL (dump), FPL (dump), sysrev.
And the results have been:
https://drive.google.com/open?id=1uUOz93ugn2PemwOZPmXnNawtwFG0r469

I made an exchange between positions of the cells, and the failure of link between them continues.
I can only think that it can be the backplane of the cells, or that there is a fault in the connections or wiring, but I can not think or know how to isolate the fault and be able to solve it.
In cabin 1, where most of the cells of partition 1 are located (which is the one that is reporting the fault), with the tests performed the cells work for a few seconds with unlocked BIB, but at the moment they perform a reset and they placed in locked BIB state.

In addition, an alarm is indicated on the side of the cells of cabin 1.
Is there any way to confirm that the backplane is giving the bug?

Thank you very much for the help.
 
jsm6
HPE Pro
Solution

Re: I need your help: hp Integrity Superdome SX2000 SD64

Hi Alberto,

Sorry was on leave, so couldnt reply earlier.

Logs shows excessive fabric errors reported by cells from Cab 1.

If posisble, I would suggest to schedule downtime for all npars and perform AC power cycle for both cabinets (though Cab 1 is sufficient, par 0 also needs downtime to shut cab 1...So better perofm on both Cabinets when you get schedule).

a.Shutdown all OS and power OFF npars.Remove AC power to both cab0 and cab1 using the AC breakers.
b.Physically reseat Cab 1 Bay 1 IO chassis 3 and all Cells in Cab 1 (remove Cells and insert it after 30secs).
c.Apply AC power to Cab0. Wait for 20 seconds, then apply AC power to Cab1.
d.Power on Npars and check status.

If still fails, then you might need to log a Support ticket with HPE for Hardware T/S and isolate faults with logs (please collect SEL/FPL in dump format and not text). For example,

MP : SL > FPl >>> Select "d" to dump

Thanks and Regards.

 

I am a HPE Employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo

Alberto_R
Occasional Advisor

Re: I need your help: hp Integrity Superdome SX2000 SD64

Hi jsm6.
First of all I have to apologize for not having returned to give news for this post.
Secondly, I have to thank you for all the help and advice offered so that I can solve this problem with the customer.
And finally indicate to you that the fault has been solved by restarting the two cabins as you indicated.
I did the restart as follows:
Turn off operating system.
Turn off cabins
Disconnect power supplies from two cabins.
Disconnect power supplies, all.
Disconnect interfaces, all.
Disconnect cells, all.
Wait 15 minutes.
and reconnect everything in the reverse way in which I disconnected them., first cells, then interfaces ....
Once all connected, feed the cabins.
The operator first activated the partition that was not possible to be raised, and it worked, the partition added the cells without problems and the partition was activated.
Then he activated the zero and two partitions and the two cabins were running.
Thank you for your support and help, and I hope that if any other colleague has a similar problem this can help.
Thank you very much for everything.