MSA Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

MSA2312sa controller fail.

 
truongson
Occasional Visitor

MSA2312sa controller fail.

Hi there,

 

Our storge system MSA231sa had fail with controller A (was replaced with new one, but after that still got fail) as:

 

B18136 2015-04-12 16:25:11 84 W B Killed partner controller. (reason: PCIE link recovery failed [failover reason code: 29])
B18137 2015-04-12 16:25:11 194 I B Auto-write-through trigger event: partner processor down.
B18138 2015-04-12 16:25:11 71 I B Failover initiated, failover set A
B18139 2015-04-12 16:25:11 114 I B Disk link down. (Channel: 0)
B18140 2015-04-12 16:25:11 114 I B Disk link down. (Channel: 1)
B18141 2015-04-12 16:25:16 211 I B The SAS topology has changed (components were added or removed). (Channel: 1, number of elements: 5, expanders: 0, native levels: 0, partner levels: 0, device PHYs: 0)
B18142 2015-04-12 16:25:16 211 I B The SAS topology has changed (components were added or removed). (Channel: 0, number of elements: 61, expanders: 1, native levels: 1, partner levels: 0, device PHYs: 9)
B18143 2015-04-12 16:25:19 19 I B A rescan-bus operation was done. (number of disks that were found: 8, number of enclosures that were found: 1) (rescan reason code: 2)
B18144 2015-04-12 16:25:19 77 I B Cache was initialized for controller A. Write-back data was found.
B18145 2015-04-12 16:25:19 71 I B Failover completed, failover set A
B18146 2015-04-12 16:25:31 310 I B Discovery and initialization of enclosure data was completed following a rescan.
B18147 2015-04-12 16:25:31 19 I B A rescan-bus operation was done. (number of disks that were found: 8, number of enclosures that were found: 1) (rescan reason code: 24)
B18148 2015-04-12 16:26:19 313 E B RAID controller A failed, reason PCIE link recovery failed. Product ID , S/N
B18149 2015-04-12 16:26:19 314 E B There is a problem with a FRU. (FRU type: Controller module A, enclosure: 1, product ID: , SN: , version: , related event serial number: B18148, related event code: 313)
B18150 2015-04-12 18:44:24 206 I B A scrub-vdisk job was started. (vdisk: VolGroup01, SN: 00c0ff10098e0000ebe8584c00000000)
B18151 2015-04-12 22:39:16 207 I B Vdisk scrub completed, no errors found. (Vdisk: VolGroup01, SN: 00c0ff10098e0000ebe8584c00000000)

 

What shoud I do with this case?

1 REPLY 1
AnkitM
Trusted Contributor

Re: MSA2312sa controller fail.

It is possible that you may have got DOA (Dead on Arrival) Part.

Try replacing the Controller 1 more time, if it still fails then you may need to isolate the issue to Midplane Chassis.

 

If this is not coverered under a warranty and if you can then you can follow:

Power off Hosts

Power Down MSA

Swap the Controllers ( A to Slot B and B to Slot A).

Power On the MSA.

 

If the Fault follows the controller then it is a Controller Failure (DOA) --- > Replace Controller Module.

If the Fault still remains on Controller A (which was originally Controller B) ---- Try replacing Midplane Chassis.

Was your question answered correctly? If so, please remember to mark your question Answered when you get the correct answer and award KUDOS! to the person providing the answer. This helps others searching for a similar issue.