Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

MSA2324fc - Failure: PCIE link recovery failed

a_young
Occasional Visitor

MSA2324fc - Failure: PCIE link recovery failed

Dear expert,

I have multiple failures on MSA2324fc controller for "PCIE link recovery failed".

Do you have any idea about these failure ?

Please assist, Thank You !

below are the event of the failure (latest to Oldest):
- MSA2324fc Array ; Controller A ERROR Controller B failed. (reason: PCIE link recovery failed, product ID: , SN: )
- MSA2324fc Array ; Controller A WARNING Host link down. (port: 2)
- MSA2324fc Array ; Controller A WARNING Host link down. (port: 1)
- MSA2324fc Array ; Controller A WARNING Killed partner controller. (reason: PCIE link recovery failed [failover reason code: 29])
- MSA2324fc Array ; Controller A WARNING Host link down. (port: 1)
6 REPLIES
marsh_1
Honored Contributor

Re: MSA2324fc - Failure: PCIE link recovery failed

hi,

when these have occurred in the past fot fc arrays it is generally a firmware issue, raise acall with hp if you are up to date on firmware already.

hth

Ken Miller_3
Advisor

Re: MSA2324fc - Failure: PCIE link recovery failed

Hi,

Andy's on vacation, so I thought I'd respond.

We first saw this error with Rev 18, but have now repeated it with Rev 21.

We're in the process of escalating this.

I did, however, note that in the Release Notes for Rev 21 (URL http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=12169&prodSeriesId=3971478&swItem=co-72827-1&prodNameId=3882365&swEnvOID=2078&swLang=13&taskId=135&mode=4&idx=0 ) there is the note:


*

Issue: Changes to default Fibre Channel HBA driver parameters are required for proper controller failover.
*

Workaround: The Fibre Channel HBA parameters were omitted from the MSA2000 G2 documentation. For controller failover to function properly, change the default driver parameters as follows:
o

QLogic â Port Down Retry Count = 60, Link Down Timeout = 60
o

Emulex â LinkTimeOut = 60, NodeTimeOut = 60

We had our "Port Down Retry Count" at 10 (for multipath) and "Link Down Timeout" is at 8 and doesn't seem to be a settable parameter (at least according to modinfo qla2xxx) . Any one have any further detail on these parameters, and specifically how to set the latter?

I'm testing 6 of our MSAs with qlport_down_retry=60 and
ql2xloginretrycount=60 at the moment.

Thanks,

== k ==
Rao Uppuluri
Advisor

Re: MSA2324fc - Failure: PCIE link recovery failed

Hello all,
Happy New Year!
We are also experiencing multiple failures on our MSA2324fc for controller B (its always same controller) I see similar errors as the original poster. EX:

2010-01-07 00:16:36 A171 313 Controller B failed. (reason: PCIE link recovery failed, product ID: , SN: )

2010-01-07 00:15:31 B226 107 Critical Error: Fault Type: NMI p1: 0x0226FF4, p2: 0x029AB1E, p3: 0x0000000, p4: 0x0000000 CThr: MScrub 36

Opened a call with HP HW few weeks ago. They changed both controllers 2 weeks ago, problem re-occured last night. Called HP again and they are looking into it. They are talking about a new firmware coming soon, but not clear when.

We have SC firmware version: M110R21 installed on both controllers.
Thank you for any input/suggestions.

-Rao
Sv_3
Occasional Visitor

Re: MSA2324fc - Failure: PCIE link recovery failed

we have a similar problem, maybe it is related to controller enclosure hardware ? There are two MSA 2324fc in which the second controller goes down.
RobertsMerks
Occasional Visitor

Re: MSA2324fc - Failure: PCIE link recovery failed

Hello!
I just bought new MSA2324sa and updated firmware to M110R25 and heving the same problem. When one conttroller shuts down on vdisk initialization or expansion.


2010-02-12 14:50:26 B1704 314 There is a problem with a FRU. (FRU type: Controller module A, enclosure: 1, product ID: AJ808A, SN: 2S6944T147, version: 56, related event serial number: B1703, related event code: 313)
2010-02-12 14:50:26 B1703 313 Controller A failed. (reason: PCIE link recovery failed, product ID: , SN: )
2010-02-12 14:49:23 A1848 107 Critical Error: Fault Type: Debug Except., Dbg Reg Num = 0 p1: 0x02E9B39, p2: 0x027D3A3, p3: 0x027DFF2, p4: 0x028DF22 CThr: RaidIo


2010-02-12 14:49:17 B1691 84 Killed partner controller. (reason: PCIE link recovery failed [failover reason code: 29])
RobertsMerks
Occasional Visitor

Re: MSA2324fc - Failure: PCIE link recovery failed

Hi!

Do not use!!! M110R25-1 for "HP Storage Works 2312/2324 Modular Smart Array RAID Storage"

use "HP StorageWorks 2000 G2 Modular Smart Array Controller Firmware" M110R21-01

I Downgrade to M110R21-01 "HP StorageWorks 2000 G2 Modular Smart Array Controller " Firmware and problem disapiered.

Later I will test just released M110R28-02
also for "HP StorageWorks 2000 G2 Modular Smart Array Controller "