HPE EVA Storage

HP MSA 2012fc controler problem

 
SebaB
Occasional Visitor

HP MSA 2012fc controler problem

Hello,

 

I have MSA 2012fc with dual controlers, firmware J201R09. We have replaces both controlers and failers are still.

Event LOG:

 

Info  2011-07-18 08:28:44  204 A86  Hardware Flush p1=ah p2=30h 
Info  2011-07-18 08:28:43  191 B4365  Capacitor auto-writethrough trigger event: capacitor recovered 
Info  2011-07-18 08:28:31  112 B4364  Host link down Chan0 
Info  2011-07-18 08:28:31  112 B4363  Host link down Chan1 
Info  2011-07-18 08:28:10  211 B4362  SAS Topology Change: Chan0, 114 elements, 2 expanders, 1 native level, 1 partner level, 26 device PHYs 
Info  2011-07-18 08:28:10  211 A85  SAS Topology Change: Chan0, 114 elements, 2 expanders, 1 native level, 1 partner level, 26 device PHYs 
Info  2011-07-18 08:28:05  204 B4361  Hardware Flush p1=2h p2=2dh 
Info  2011-07-18 08:28:05  204 A84  Hardware Flush p1=ch p2=2fh 
Info  2011-07-18 08:28:04  33 B4360  Time/date has been changed to 2011-07-18 08:28:03 
Info  2011-07-18 08:28:04  190 B4359  Capacitor auto-writethrough trigger event: capacitor failed 
Info  2011-07-18 08:28:00  56 B4358  Storage Controller booted. SC code version: J201R09 
Info  2011-07-18 08:27:55  190 A83  Capacitor auto-writethrough trigger event: capacitor failed 
Info  2011-07-18 08:27:50  56 A82  Storage Controller booted. SC code version: J201R09 
Critical  2011-07-16 00:38:39  107 B4357  Critical Error: Fault Type: Debug Except., Dbg Reg Num = 1 p1: 01EBE5D p2:01EBCDB p3: 01EBC18 p4:01EB9BB CThr: MScrub 01 
Info  2011-07-15 23:24:49  206 B4356  Scrub Vdisk started (Vdisk: disk-01, SN: 00c0ffd59769004801f8c64800000000) 
Info  2011-07-15 02:57:03  207 B4355  Vdisk scrub completed, no errors found. (Vdisk: disk-02, SN: 00c0ffd596fb0048b4cb894d00000000) 
Info  2011-07-14 23:24:08  207 B4354  Vdisk scrub completed, no errors found. (Vdisk: disk-01, SN: 00c0ffd59769004801f8c64800000000) 
Info  2011-07-14 16:55:18  206 B4353  Scrub Vdisk started (Vdisk: disk-02, SN: 00c0ffd596fb0048b4cb894d00000000) 
Critical  2011-07-14 15:25:39  314 B4352  FRU type: RAID IOM A, problem: encl 0. Product ID: AJ744A, S/N: 3CL825R588 rev: P. Related event ID: 4351, type: 313 
Critical  2011-07-14 15:25:39  313 B4351  RAID controller A failed, reason PCIE link recovery failed. Product ID , S/N 
Info  2011-07-14 15:25:03  310 B4350  Discovery and initialization of enclosure data has completed following a rescan. 
Info  2011-07-14 15:24:42  19 B4349  Rescan bus done. Reason Code: 24. Found 24 drives, 2 Drive Enclosures 
Info  2011-07-14 15:24:40  111 B4348  Host link up Chan1: 3 Loop IDs, External Device(s) 
Warning  2011-07-14 15:24:39  112 B4347  Host link down Chan1 
Info  2011-07-14 15:24:39  111 B4346  Host link up Chan0: 3 Loop IDs, External Device(s) 
Info  2011-07-14 15:24:39  111 B4345  Host link up Chan1: 2 Loop IDs 
Info  2011-07-14 15:24:39  71 B4344  Failover completed, failover set A 
Info  2011-07-14 15:24:38  77 B4343  Cache initialized for RAID controller A. WB data found 
Info  2011-07-14 15:24:38  19 B4342  Rescan bus done. Reason Code: 2. Found 24 drives, 2 Drive Enclosures 
Critical  2011-07-14 15:24:34  107 A81  Critical Error: Fault Type: Debug Except., Dbg Reg Num = 1 p1: 01EBE5D p2:01EBCDB p3: 01EBC18 p4:01EB9BB CThr: MScrub 01 
Info  2011-07-14 15:24:32  211 B4341  SAS Topology Change: Chan0, 109 elements, 2 expanders, 2 native levels, 0 partner levels, 26 device PHYs 
Info  2011-07-14 15:24:28  114 B4340  Drive link down Chan0 
Warning  2011-07-14 15:24:27  112 B4339  Host link down Chan1 
Info  2011-07-14 15:24:27  71 B4338  Failover initiated, failover set A 
Warning  2011-07-14 15:24:27  112 B4337  Host link down Chan0 
Info  2011-07-14 15:24:27  194 B4336  Auto-writethrough trigger event: partner processor down 
Warning  2011-07-14 15:24:27  84 B4335  Killed partner controller; reason=29 (PCIE link recovery failed) 
Info  2011-07-14 14:01:01  206 A80  Scrub Vdisk started (Vdisk: disk-01, SN: 00c0ffd59769004801f8c64800000000) 
Info  2011-07-13 16:54:44  207 B4334  Vdisk scrub completed, no errors found. (Vdisk: disk-02, SN: 00c0ffd596fb0048b4cb894d00000000) 
Info  2011-07-13 16:53:48  33 B4333  Time/date has been changed to 2011-07-13 16:53:47 
Info  2011-07-13 14:00:02  207 A79  Vdisk scrub completed, no errors found. (Vdisk: disk-01, SN: 00c0ffd59769004801f8c64800000000) 
Info  2011-07-13 11:16:00  310 B4332  Discovery and initialization of enclosure data has completed following a rescan. 
Info  2011-07-13 11:15:35  139 B4331  Management Controller booted. MC code version: W420R72 
Info  2011-07-13 11:15:33  175 B4330  Ethernet link up for controller B 
Info  2011-07-13 11:15:33  181 B4329  LAN configuration parameters have been set 
Info  2011-07-13 11:15:33  28 B4328  Controller configuration parameters have been changed 
Info  2011-07-13 11:15:24  310 B4327  Discovery and initialization of enclosure data has completed following a rescan. 
Info  2011-07-13 11:15:24  310 A78  Discovery and initialization of enclosure data has completed following a rescan. 
Info  2011-07-13 11:15:00  19 B4326  Rescan bus done. Reason Code: 24. Found 24 drives, 2 Drive Enclosures 
Info  2011-07-13 11:14:59  19 A77  Rescan bus done. Reason Code: 24. Found 24 drives, 2 Drive Enclosures 
Info  2011-07-13 11:14:58  111 B4325  Host link up Chan0: 3 Loop IDs, External Device(s) 
Info  2011-07-13 11:14:58  111 B4324  Host link up Chan1: 3 Loop IDs, External Device(s) 
Info  2011-07-13 11:14:58  72 B4323  Recovery completed for Controller B 
Info  2011-07-13 11:14:58  111 A76  Host link up Chan1: 3 Loop IDs, External Device(s) 
Info  2011-07-13 11:14:57  77 B4322  Cache initialized for RAID controller B. WB data found 
Info  2011-07-13 11:14:57  72 B4321  Recovery initiated for Controller B 
Info  2011-07-13 11:14:57  111 A75  Host link up Chan0: 3 Loop IDs, External Device(s) 
Info  2011-07-13 11:14:57  72 A74  Recovery completed for Controller B 
Info  2011-07-13 11:14:57  19 A73  Rescan bus done. Reason Code: 6. Found 24 drives, 2 Drive Enclosures 
Info  2011-07-13 11:14:45  20 B4320  Storage Controller firmware update complete. SC code version: J201R09 
Info  2011-07-13 11:14:44  271 B4319  Warning: could not obtain serial number from EEPROM; using CM serial number for MAC address. 
Info  2011-07-13 11:14:44  73 B4318  Heartbeat detected from the other RAID controller 

3 REPLIES 3

Re: HP MSA 2012fc controler problem

Johan Guldmyr
Honored Contributor

Re: HP MSA 2012fc controler problem

Has a manual reboot been done since the replacements? Definitely worth a shot.

 

Maybe these three entries would tell something? :


Critical  2011-07-16 00:38:39  107 B4357  Critical Error: Fault Type: Debug Except., Dbg Reg Num = 1 p1: 01EBE5D p2:01EBCDB p3: 01EBC18 p4:01EB9BB CThr: MScrub 01 Info  

 

2011-07-15 23:24:49  206 B4356  Scrub Vdisk started (Vdisk: disk-01, SN: 00c0ffd59769004801f8c64800000000) 

 

Info  2011-07-18 08:27:50  56 A82  Storage Controller booted. SC code version: J201R09 

 

So first scrub vdisk started for one disk and then a bit later there was a debug error and then a bit later both controller restarted.

Could it be that they restarted because the scrub had hung? To recover from that condition?

 

On the 13th and 14th it was the same scrub going on (for the same vdisk) when the controllers restarted.

 

I don't know how to fix it, but I would guess that there is something bad with disk-01 or at least with the scrub for it. 

There are some special MSA commands like: verify. There are more posts about this command in this forum that you should look at - you may have to re-create the vdisk or something like that. It wasn't exactly a just run and test command anyway.

SebaB
Occasional Visitor

Re: HP MSA 2012fc controler problem

No there were no manual reboots. System made failover. On that storage we replaced in last month:

3 failed disks, 1 power supply, both controlers. I have seen customer advisory ... it is firmware issue. We have last firmware J201R09. What else should be the issue?

 

Thanks for reply.