MSA Storage

MSA 2040 Controller-B killed partner controller. (reason pcie link recovery failed)

 
SOLVED
Go to solution
Zaid_Al-Ani
Visitor

MSA 2040 Controller-B killed partner controller. (reason pcie link recovery failed)

Hello,

 

We have MSA 2040 and one of the controller wend down with this error

killed partner controller. (reason pcie link recovery failed)

the controller does not accept any restart action

network of that controller is working fine and can only ssh to it, we cant access it by browsers

 

what can we do to fix this ?

Thanks in advance

6 REPLIES 6
Shawn_K
HPE Pro
Solution

Re: MSA 2040 Controller-B killed partner controller. (reason pcie link recovery failed)

Hello,

The PCIE link is the inner-communication link between the two controller which handles their heartbeat communication. Without a log review it is hard to tell if Controller A lost the heartbeat to Controller B and issued a kill to Controller B, or if COntroller B lost the heartbeat communication with Controller A and took itself down.

My suggestion is to gather a set of logs first and hold onto them in case you need them later. The controller that is down, try to issue a shutdown command through SSH. The command will likely fail but it is good to try and shut it down correctly first. Then remove the controller from the backplane about an inch. Wait for 5-10 minutes. This will allow failover to occur and the surviving controller to rescan the backend and perform other necessary steps. Check the controller that was removed and ensure all the LEDs have stopped flashing. Then insert the controller. It should boot up and resume normal operation.

There have been several fixes for PCIE link errors in newer firmware. Be sure you are running the latest firmware versions. You can check the firmware on your system here: www.hpe.com/storage/msafirmware 

Also, if you wish to check other components on your array you can perform an MSA Health Check here:  

www.hpe.com/storage/MSAHealthCheck

Download your MSA Log File from your MSA array
Upload the MSA Log File into the MSA Health Check website
Review Results by clicking through the tabs and saving the PDF report
Links to array, enclosure, and drive firmware will be provided

If restarting the controller does not resolve your issue, then please open a support case with HPE Support using the following web link, if the unit is under warranty: https://support.hpe.com/hpesc/public/home

If the unit is out of warranty, you can open a chat support case with HPE using the following web link to check for options: https://pg-receiver-pro.glb.itcs.hpe.com/WCLWeb/WCLEntry.aspx

Cheers,
Shawn

I work for Hewlett Packard Enterprise. The comments in this post are my own and do not represent an official reply from HPE. No warranty or guarantees of any kind are expressed in my reply.


I work for HPE

Accept or Kudo

Zaid_Al-Ani
Visitor

Re: MSA 2040 Controller-B killed partner controller. (reason pcie link recovery failed)

HEllo Shawn,
Thank you for the Valuable instructions
But we still have one volume owned by controller B,
Is it safe to change owner first or it will Breakdown the data inside it (this volume is an oracle RAC datastore)
Or can i shutdown and remove the controller even if there is still a volume owned by this controller?
BR
Shawn_K
HPE Pro

Re: MSA 2040 Controller-B killed partner controller. (reason pcie link recovery failed)

Hello,

From the array, shutting down a controller and failing over the volume owned by the controller to the other controller is not an issue. That will happen smoothly.

More concering will be how the hosts are mapped to the system. As long as the hosts are mapped correctly so that all hosts have access to both controllers you will not have a disruption. My suggestion is to ensure the mapping is correct so the database associated with the volume owned by Controller B is also correctly mapped to the Controller A pathing. I would also confirm the multipathing is correct for the host(s) associated with that database to ensure a smooth failover.

Cheers,
Shawn

I work for Hewlett Packard Enterprise. The comments in this post are my own and do not represent an official reply from HPE. No warranty or guarantees of any kind are expressed in my reply.


I work for HPE

Accept or Kudo

Zaid_Al-Ani
Visitor

Re: MSA 2040 Controller-B killed partner controller. (reason pcie link recovery failed)

Hello Shawn,
Thank you again for your help
I tried to shutdown the controller but this error shows
Failed to shut down Storage Controller Controller B.
The reboot operation cannot be completed because a recovery is in progress. (2020-05-01 23:38:55)

I cant access the MSA physicaly until sunday
If it doesnt accept shutting down (i can restart it only)
Can i remove the controller without shutting it down??
Or this will cause a damage?

BR,
Shawn_K
HPE Pro

Re: MSA 2040 Controller-B killed partner controller. (reason pcie link recovery failed)

Hello,

I would always suggest you try shutting down the controller prior to removing it. However, if the command fails like you stated I would remove the controller and follow the rest of the instructions listed previously. Be sure to allow Controller B time enough to fully discharge before you re-insert it. This will also give Controller A and the hosts enough time to properly failover.

Cheers,
Shawn

I work for Hewlett Packard Enterprise. The comments in this post are my own and do not represent an official reply from HPE. No warranty or guarantees of any kind are expressed in my reply.


I work for HPE

Accept or Kudo

Zaid_Al-Ani
Visitor

Re: MSA 2040 Controller-B killed partner controller. (reason pcie link recovery failed)

You are Amazing.

thank you, you saved my day

I took the controller out for 15 minutes then put it back.

 

everything working fine now.

 

Thanks again