Re: controller not joining cluster after replacement

Imad_blg · ‎10-19-2023

Hello,
Has someone faced the issue when replacing faulty controller on HPE 3Par 8400, after replacement the node dosen't booting well automatically, after that i forced node rescue manually via cli startnoderescue -node, here the node is booting correct the led status is green and not blinking according to my research that mean the node is working well but not in cluster.
with command shownode it showing that node still failed even i have replace it .
i tried to remove/insert the node replacement but the same thing happen, node can't join cluster.
=======================================
# showalert :
Id : 26
State : New
Message Code: 0x06200fa
Time : 2023-10-12 21:05:18 WEST
Severity : Major
Type : Component state change
Message : Node 0, SubSys Device HBA, SubSys Instance 3 Failed (Node Offline Due to Failure {0xd}, Node HBA Failure {0x28})

Id : 28
State : New
Message Code: 0x06200fa
Time : 2023-10-19 02:58:28 WEST
Severity : Major
Type : Component state change
Message : Node 0, SubSys Device Unknown, SubSys Instance 0 Failed (Node Offline Due to Failure {0xd}, Node HBA Failure {0x28}, Fatal Boot Error {0x29})

Id : 27
State : New
Message Code: 0x02d00fa
Time : 2023-10-19 04:45:33 WEST
Severity : Major
Type : Component state change
Message : Cage 0, Interface Card 0 Failed (Interface Card Firmware Unknown {0x0})

Id : 21
State : New
Message Code: 0x01a001c
Time : 2023-10-19 05:40:01 WEST
Severity : Major
Type : Link establish alert
Message : Node 1 Failed to establish link to Node 0 from Node 1 link 3

================================
# shownode
Control Data Cache
Node ----Name---- -State- Master InCluster -Service_LED ---LED--- Mem(MB) Mem(MB) Available(%)
0 CZ38*****-0 Failed No No Unknown Unknown 0 0 0
1 CZ38*****-1 OK Yes Yes Off GreenBlnk 16384 16384 100

================================

veeyarvi · ‎10-30-2023

Hi Imad_big,

As per the alerts, it seems the issue is with one of the HBAs or PCI bus of the node 0. Is there any add on HBAs in the node?

Also, what was the issue with the first node? If it was reporting the same failure and there is an additional HBA (which is moved from the original failed node to the new node), I would suspect the same is the issue here.

It is not clear whether the node rescue was successful. Are you able to login to the node 0 console with the credentials? May be, the manual rescue was successful but an issue with node preventing it to join the cluster.

Also, can see the error with task details for the automatic node rescue ('showtask' to find the node_rescue task id and then 'showtask -d <task id>' to get details of the taks).

PS: It is not clear whether the array is still in contract with HPE. In that case, I assume the support would take care of this issue.

Regards,

Veeyaarvi

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Categories

Company

Local Language

Forums

Discussions

Knowledge Base

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: controller not joining cluster after replacement

controller not joining cluster after replacement

Re: controller not joining cluster after replacement