1833705 Members
2220 Online
110063 Solutions
New Discussion

Need help with MSA1000

 
AFyodorov
Advisor

Need help with MSA1000

We have an Exchange 2003 cluster, two DL380 G4 nodes hooked up to MSA1000. Each DL380 G4 uses two fibre cables. Pretty classic setup.

Starting some time ago, Controller 1 on the MSA1000 decided to go offline and Controller 2 became active.

We have done some tests powering everything down and bringing everything up, one component at a time and it looks like Controller 1 goes offline right after Cluster Node A boots.

It seems that something on Node A forces the MSA Controller 1 to go offline.

Both HBAs on Node A seem to be healthy (I believe one of the HBAs went bad a long time ago and was replaced under warranty)

Has anyone seen behavior like this before?

Here is the CLI output from the MSA1000:

show tech_support

CLI> show version
Firmware version: 4.48 build 342
Hardware Revision: 7 [AutoRev: 0x010000]
Internal EMU Rev: 1.86 (SGM05133P9)
External Box EMU 2 Rev: CP20 (MXNHLMPX7P)
External Box EMU 3 Rev: CP20 (M1ZNLMPX84)

CLI> show profile
Profile name = Default (Default)
Mode 0 = Peripheral Device LUN Addressing
Mode 1 = Legacy Failover
Mode 2 = Logical volumes connect as available on Backup Controller
Mode 3 = Product ID of 'MSA1000 VOLUME'
Mode 4 = Normal bad block handling
Mode 5 = Logout selected initiators on TPRLO
Mode 6 = Fault management events not reported through Unit Attention
Mode 7 = Do not send FCP response info with SCSI status
Mode 8 = Do not send Unit Attention on failover
Mode 9 = SCSI inquiry revision field contains the actual version
Mode 10 = SCSI inquiry vendor field contains Compaq
Mode 11 = Power On Reset Unit Attention generated on FC Login or Logout
Mode 12 = Ignore Force Unit Access on Write

Profile name = Windows
Type = Standard
Mode 0 = Peripheral Device LUN Addressing
Mode 1 = Legacy Failover
Mode 2 = Logical volumes connect as available on Backup Controller
Mode 3 = Product ID of 'MSA1000 VOLUME'
Mode 4 = Normal bad block handling
Mode 5 = Logout selected initiators on TPRLO
Mode 6 = Fault management events not reported through Unit Attention
Mode 7 = Do not send FCP response info with SCSI status
Mode 8 = Do not send Unit Attention on failover
Mode 9 = SCSI inquiry revision field contains the actual version
Mode 10 = SCSI inquiry vendor field contains Compaq
Mode 11 = Power On Reset Unit Attention generated on FC Login or Logout
Mode 12 = Ignore Force Unit Access on Write

Profile name = Windows (degraded performance)
Type = Custom
Mode 0 = Peripheral Device LUN Addressing
Mode 1 = Legacy Failover
Mode 2 = Logical volumes connect as available on Backup Controller
Mode 3 = Product ID of 'MSA1000 VOLUME'
Mode 4 = Normal bad block handling
Mode 5 = Logout selected initiators on TPRLO
Mode 6 = Fault management events not reported through Unit Attention
Mode 7 = Do not send FCP response info with SCSI status
Mode 8 = Do not send Unit Attention on failover
Mode 9 = SCSI inquiry revision field contains the actual version
Mode 10 = SCSI inquiry vendor field contains Compaq
Mode 11 = Power On Reset Unit Attention generated on FC Login or Logout
Mode 12 = Enforce Force Unit Access on Write

Profile name = OpenVMS
Type = Custom
Mode 0 = Peripheral Device LUN Addressing
Mode 1 = Asymmetric Failover
Mode 2 = Logical volumes connect as unavailable on Backup Controller
Mode 3 = Product ID of 'MSA1000 VOLUME'
Mode 4 = VMS shadow style bad block handling
Mode 5 = Logout all initiators on TPRLO
Mode 6 = Fault management events reported through Unit Attention
Mode 7 = Do not send FCP response info with SCSI status
Mode 8 = Do not send Unit Attention on failover
Mode 9 = SCSI inquiry revision field contains the actual version
Mode 10 = SCSI inquiry vendor field contains Compaq
Mode 11 = Power On Reset Unit Attention generated on FC Login or Logout
Mode 12 = Enforce Force Unit Access on Write

Profile name = Tru64
Type = Standard
Mode 0 = Peripheral Device LUN Addressing
Mode 1 = Asymmetric Failover
Mode 2 = Logical volumes connect as unavailable on Backup Controller
Mode 3 = Product ID of 'MSA1000 VOLUME'
Mode 4 = Normal bad block handling
Mode 5 = Logout all initiators on TPRLO
Mode 6 = Fault management events reported through Unit Attention
Mode 7 = Do not send FCP response info with SCSI status
Mode 8 = Send Unit Attention on failover
Mode 9 = SCSI inquiry revision field contains the actual version
Mode 10 = SCSI inquiry vendor field contains Compaq
Mode 11 = Power On Reset Unit Attention generated on FC Login or Logout
Mode 12 = Enforce Force Unit Access on Write

Profile name = Linux
Type = Standard
Mode 0 = Peripheral Device LUN Addressing
Mode 1 = Asymmetric Failover
Mode 2 = Logical volumes connect as available on Backup Controller
Mode 3 = Product ID of 'MSA1000 VOLUME'
Mode 4 = Normal bad block handling
Mode 5 = Logout all initiators on TPRLO
Mode 6 = Fault management events not reported through Unit Attention
Mode 7 = Send FCP response info with SCSI status
Mode 8 = Do not send Unit Attention on failover
Mode 9 = SCSI inquiry revision field contains the actual version
Mode 10 = SCSI inquiry vendor field contains Compaq
Mode 11 = Power On Reset Unit Attention generated on FC Login or Logout
Mode 12 = Enforce Force Unit Access on Write

Profile name = Solaris
Type = Standard
Mode 0 = Peripheral Device LUN Addressing
Mode 1 = Asymmetric Failover
Mode 2 = Logical volumes connect as available on Backup Controller
Mode 3 = Product ID of 'MSA1000 VOLUME'
Mode 4 = Normal bad block handling
Mode 5 = Logout all initiators on TPRLO
Mode 6 = Fault management events not reported through Unit Attention
Mode 7 = Send FCP response info with SCSI status
Mode 8 = Do not send Unit Attention on failover
Mode 9 = SCSI inquiry revision field contains the actual version
Mode 10 = SCSI inquiry vendor field contains Compaq
Mode 11 = Power On Reset Unit Attention generated on FC Login or Logout
Mode 12 = Enforce Force Unit Access on Write

Profile name = Netware
Type = Standard
Mode 0 = Peripheral Device LUN Addressing
Mode 1 = Asymmetric Failover
Mode 2 = Logical volumes connect as available on Backup Controller
Mode 3 = Product ID of 'MSA1000 VOLUME'
Mode 4 = Normal bad block handling
Mode 5 = Logout all initiators on TPRLO
Mode 6 = Fault management events not reported through Unit Attention
Mode 7 = Send FCP response info with SCSI status
Mode 8 = Do not send Unit Attention on failover
Mode 9 = SCSI inquiry revision field contains the actual version
Mode 10 = SCSI inquiry vendor field contains Compaq
Mode 11 = Power On Reset Unit Attention generated on FC Login or Logout
Mode 12 = Enforce Force Unit Access on Write

Profile name = HP
Type = Standard
Mode 0 = Volume Set Addressing
Mode 1 = Asymmetric Failover
Mode 2 = Logical volumes connect as available on Backup Controller
Mode 3 = Product ID of 'MSA1000 VOLUME'
Mode 4 = Normal bad block handling
Mode 5 = Logout all initiators on TPRLO
Mode 6 = Fault management events not reported through Unit Attention
Mode 7 = Send FCP response info with SCSI status
Mode 8 = Do not send Unit Attention on failover
Mode 9 = SCSI inquiry revision field contains the actual version
Mode 10 = SCSI inquiry vendor field contains Compaq
Mode 11 = No Power On Reset Unit Attention generated on FC Login or Logout
Mode 12 = Enforce Force Unit Access on Write

Profile name = Windows_SP2_and_below
Type = Custom
Mode 0 = Peripheral Device LUN Addressing
Mode 1 = Legacy Failover
Mode 2 = Logical volumes connect as available on Backup Controller
Mode 3 = Product ID of 'MSA1000 VOLUME'
Mode 4 = Normal bad block handling
Mode 5 = Logout selected initiators on TPRLO
Mode 6 = Fault management events not reported through Unit Attention
Mode 7 = Do not send FCP response info with SCSI status
Mode 8 = Do not send Unit Attention on failover
Mode 9 = SCSI inquiry revision field contains the actual version
Mode 10 = SCSI inquiry vendor field contains Compaq
Mode 11 = Power On Reset Unit Attention generated on FC Login or Logout
Mode 12 = Enforce Force Unit Access on Write


CLI> show globals

Global Parameters:
System Name: SGM05133P9
Rebuild Priority: low
Expand Priority: low

Total Cache: 512MB
50% Read Cache: 256MB
50% Write Cache: 256MB

Temperature:
EMU: 28 Celsius, 82 Fahrenheit
PS1: 38 Celsius, 100 Fahrenheit
PS2: 38 Celsius, 100 Fahrenheit

CLI> show acl

ACL is disabled. To enable ACL use 'add acl'.

CLI> show connections

Connection Name:
Host WWNN = 200000E0-8B1FEE1A
Host WWPN = 210000E0-8B1FEE1A
Profile Name = Default
Unit Offset = 0
Controller 1 Port 1 Status = Online

Connection Name:
Host WWNN = 200000E0-8B1FA019
Host WWPN = 210000E0-8B1FA019
Profile Name = Default
Unit Offset = 0
Controller 1 Port 1 Status = Online

Connection Name:
Host WWNN = 200000E0-8B1F2B1D
Host WWPN = 210000E0-8B1F2B1D
Profile Name = Default
Unit Offset = 0
Controller 2 Port 1 Status = Online

CLI> show disks
box,bay bus,ID Size Speed Units
Disk101 1,01 0,00 146.8 GB 160 MB/s 0
Disk102 1,02 0,01 146.8 GB 160 MB/s 0
Disk103 1,03 0,02 146.8 GB 160 MB/s 0
Disk104 1,04 0,03 146.8 GB 160 MB/s 0
Disk105 1,05 0,04 300.0 GB 160 MB/s 1, 8
Disk106 1,06 0,05 300.0 GB 160 MB/s 1, 8
Disk107 1,07 0,08 146.8 GB 160 MB/s 2
Disk108 1,08 1,00 146.8 GB 160 MB/s 2
Disk109 1,09 1,01 146.8 GB 160 MB/s 2
Disk110 1,10 1,02 146.8 GB 160 MB/s 2
Disk111 1,11 1,03 300.0 GB 160 MB/s 3, 4, 5
Disk112 1,12 1,04 300.0 GB 160 MB/s 3, 4, 5
Disk113 1,13 1,05 300.0 GB 40 MB/s 3, 4, 5
Disk114 1,14 1,08 300.0 GB 160 MB/s 0, 1, 2, 3, 4, 5, 6, 7, 8 (spare)
Disk201 2,01 2,00 300.0 GB 160 MB/s 7
Disk202 2,02 2,01 300.0 GB 160 MB/s 7
Disk203 2,03 2,02 300.0 GB 160 MB/s 7
Disk204 2,04 2,03 300.0 GB 160 MB/s 7
Disk205 2,05 2,04 300.0 GB 160 MB/s 7
Disk206 2,06 2,05 300.0 GB 160 MB/s 7
Disk301 3,01 3,00 72.8 GB 160 MB/s 6
Disk302 3,02 3,01 72.8 GB 160 MB/s 6
Disk303 3,03 3,02 72.8 GB 160 MB/s 6
Disk304 3,04 3,03 72.8 GB 160 MB/s 6
Disk305 3,05 3,04 72.8 GB 160 MB/s 6
Disk306 3,06 3,05 72.8 GB 160 MB/s 6
Disk307 3,07 3,08 72.8 GB 160 MB/s 6
Disk308 3,08 3,09 72.8 GB 160 MB/s 6
Disk309 3,09 3,10 72.8 GB 160 MB/s 6
Disk310 3,10 3,11 72.8 GB 160 MB/s 6
Disk311 3,11 3,12 72.8 GB 160 MB/s 6
Disk312 3,12 3,13 72.8 GB 160 MB/s 6
Disk313 3,13 3,14 72.8 GB 160 MB/s 6
Disk314 3,14 3,15 72.8 GB 160 MB/s 6

Notes:
The speed is the currently negotiated speed to the disk. This may
be less than the maximum speed supported by the device due to bus
faults, loss of signal integrity, etc.

CLI> show units

Unit 0:
In PDLA mode, Unit 0 is Lun 1; In VSA mode, Unit 0 is Lun 0.
Unit Identifier :
Device Identifier : 600805F3-00153820-A941EF60-1E0E000B
Cache Status : Enabled
Max Boot Partition: Disabled
Volume Status : VOLUME OK
Mirror Init Status: Complete
4 Data Disk(s) used by lun 0:
Disk101: Box 1, Bay 01, (SCSI bus 0, SCSI id 0)
Disk102: Box 1, Bay 02, (SCSI bus 0, SCSI id 1)
Disk103: Box 1, Bay 03, (SCSI bus 0, SCSI id 2)
Disk104: Box 1, Bay 04, (SCSI bus 0, SCSI id 3)
Spare Disk(s) used by lun 0:
Disk114: Box 1, Bay 14, (SCSI bus 1, SCSI id 8)
Logical Volume Raid Level: MIRROR FAULT TOLERANCE (Raid 1)
stripe_size=32kB
Logical Volume Capacity : 280,027MB

Unit 1:
In PDLA mode, Unit 1 is Lun 2; In VSA mode, Unit 1 is Lun 1.
Unit Identifier :
Device Identifier : 600805F3-00153820-A8F1FD03-5E70000A
Cache Status : Enabled
Max Boot Partition: Disabled
Volume Status : VOLUME OK
Mirror Init Status: Complete
2 Data Disk(s) used by lun 1:
Disk105: Box 1, Bay 05, (SCSI bus 0, SCSI id 4)
Disk106: Box 1, Bay 06, (SCSI bus 0, SCSI id 5)
Spare Disk(s) used by lun 1:
Disk114: Box 1, Bay 14, (SCSI bus 1, SCSI id 8)
Logical Volume Raid Level: MIRROR FAULT TOLERANCE (Raid 1)
stripe_size=32kB
Logical Volume Capacity : 69,463MB

Unit 2:
In PDLA mode, Unit 2 is Lun 3; In VSA mode, Unit 2 is Lun 2.
Unit Identifier :
Device Identifier : 600805F3-00153820-AF4AE8D6-E1BD000C
Cache Status : Enabled
Max Boot Partition: Disabled
Volume Status : VOLUME OK
Mirror Init Status: Complete
4 Data Disk(s) used by lun 2:
Disk107: Box 1, Bay 07, (SCSI bus 0, SCSI id 8)
Disk108: Box 1, Bay 08, (SCSI bus 1, SCSI id 0)
Disk109: Box 1, Bay 09, (SCSI bus 1, SCSI id 1)
Disk110: Box 1, Bay 10, (SCSI bus 1, SCSI id 2)
Spare Disk(s) used by lun 2:
Disk114: Box 1, Bay 14, (SCSI bus 1, SCSI id 8)
Logical Volume Raid Level: MIRROR FAULT TOLERANCE (Raid 1)
stripe_size=32kB
Logical Volume Capacity : 280,027MB

Unit 3:
In PDLA mode, Unit 3 is Lun 4; In VSA mode, Unit 3 is Lun 3.
Unit Identifier :
Device Identifier : 600805F3-00153820-AC0C2892-DBBA0010
Cache Status : Enabled
Max Boot Partition: Disabled
Volume Status : VOLUME OK
Parity Init Status: Complete
3 Data Disk(s) used by lun 3:
Disk111: Box 1, Bay 11, (SCSI bus 1, SCSI id 3)
Disk112: Box 1, Bay 12, (SCSI bus 1, SCSI id 4)
Disk113: Box 1, Bay 13, (SCSI bus 1, SCSI id 5)
Spare Disk(s) used by lun 3:
Disk114: Box 1, Bay 14, (SCSI bus 1, SCSI id 8)
Logical Volume Raid Level: DISTRIBUTED PARITY FAULT TOLERANCE (Raid 5)
stripe_size=32kB
Logical Volume Capacity : 510MB

Unit 4:
In PDLA mode, Unit 4 is Lun 5; In VSA mode, Unit 4 is Lun 4.
Unit Identifier :
Device Identifier : 600805F3-00153820-A52C589E-71780011
Cache Status : Enabled
Max Boot Partition: Disabled
Volume Status : VOLUME OK
Parity Init Status: Complete
3 Data Disk(s) used by lun 4:
Disk111: Box 1, Bay 11, (SCSI bus 1, SCSI id 3)
Disk112: Box 1, Bay 12, (SCSI bus 1, SCSI id 4)
Disk113: Box 1, Bay 13, (SCSI bus 1, SCSI id 5)
Spare Disk(s) used by lun 4:
Disk114: Box 1, Bay 14, (SCSI bus 1, SCSI id 8)
Logical Volume Raid Level: DISTRIBUTED PARITY FAULT TOLERANCE (Raid 5)
stripe_size=16kB
Logical Volume Capacity : 510MB

Unit 5:
In PDLA mode, Unit 5 is Lun 6; In VSA mode, Unit 5 is Lun 5.
Unit Identifier :
Device Identifier : 600805F3-00153820-ABCC48A8-B8F90012
Cache Status : Enabled
Max Boot Partition: Disabled
Volume Status : VOLUME OK
Parity Init Status: Complete
3 Data Disk(s) used by lun 5:
Disk111: Box 1, Bay 11, (SCSI bus 1, SCSI id 3)
Disk112: Box 1, Bay 12, (SCSI bus 1, SCSI id 4)
Disk113: Box 1, Bay 13, (SCSI bus 1, SCSI id 5)
Spare Disk(s) used by lun 5:
Disk114: Box 1, Bay 14, (SCSI bus 1, SCSI id 8)
Logical Volume Raid Level: DISTRIBUTED PARITY FAULT TOLERANCE (Raid 5)
stripe_size=32kB
Logical Volume Capacity : 571,183MB

Unit 6:
In PDLA mode, Unit 6 is Lun 7; In VSA mode, Unit 6 is Lun 6.
Unit Identifier :
Device Identifier : 600805F3-00153820-A71242E0-3FB90013
Cache Status : Enabled
Max Boot Partition: Disabled
Volume Status : VOLUME OK
Parity Init Status: Complete
14 Data Disk(s) used by lun 6:
Disk301: Box 3, Bay 01, (SCSI bus 3, SCSI id 0)
Disk302: Box 3, Bay 02, (SCSI bus 3, SCSI id 1)
Disk303: Box 3, Bay 03, (SCSI bus 3, SCSI id 2)
Disk304: Box 3, Bay 04, (SCSI bus 3, SCSI id 3)
Disk305: Box 3, Bay 05, (SCSI bus 3, SCSI id 4)
Disk306: Box 3, Bay 06, (SCSI bus 3, SCSI id 5)
Disk307: Box 3, Bay 07, (SCSI bus 3, SCSI id 8)
Disk308: Box 3, Bay 08, (SCSI bus 3, SCSI id 9)
Disk309: Box 3, Bay 09, (SCSI bus 3, SCSI id 10)
Disk310: Box 3, Bay 10, (SCSI bus 3, SCSI id 11)
Disk311: Box 3, Bay 11, (SCSI bus 3, SCSI id 12)
Disk312: Box 3, Bay 12, (SCSI bus 3, SCSI id 13)
Disk313: Box 3, Bay 13, (SCSI bus 3, SCSI id 14)
Disk314: Box 3, Bay 14, (SCSI bus 3, SCSI id 15)
Spare Disk(s) used by lun 6:
Disk114: Box 1, Bay 14, (SCSI bus 1, SCSI id 8)
Logical Volume Raid Level: DISTRIBUTED PARITY FAULT TOLERANCE (Raid 5)
stripe_size=16kB
Logical Volume Capacity : 903,036MB

Unit 7:
In PDLA mode, Unit 7 is Lun 8; In VSA mode, Unit 7 is Lun 7.
Unit Identifier :
Device Identifier : 600805F3-00153820-AE624912-85180014
Cache Status : Enabled
Max Boot Partition: Disabled
Volume Status : VOLUME OK
Mirror Init Status: Complete
6 Data Disk(s) used by lun 7:
Disk201: Box 2, Bay 01, (SCSI bus 2, SCSI id 0)
Disk202: Box 2, Bay 02, (SCSI bus 2, SCSI id 1)
Disk203: Box 2, Bay 03, (SCSI bus 2, SCSI id 2)
Disk204: Box 2, Bay 04, (SCSI bus 2, SCSI id 3)
Disk205: Box 2, Bay 05, (SCSI bus 2, SCSI id 4)
Disk206: Box 2, Bay 06, (SCSI bus 2, SCSI id 5)
Spare Disk(s) used by lun 7:
Disk114: Box 1, Bay 14, (SCSI bus 1, SCSI id 8)
Logical Volume Raid Level: MIRROR FAULT TOLERANCE (Raid 1)
stripe_size=64kB
Logical Volume Capacity : 858,305MB

Unit 8:
In PDLA mode, Unit 8 is Lun 9; In VSA mode, Unit 8 is Lun 8.
Unit Identifier :
Device Identifier : 600805F3-00153820-A4E2A5FB-A7140015
Cache Status : Enabled
Max Boot Partition: Disabled
Volume Status : VOLUME OK
Mirror Init Status: Complete
2 Data Disk(s) used by lun 8:
Disk105: Box 1, Bay 05, (SCSI bus 0, SCSI id 4)
Disk106: Box 1, Bay 06, (SCSI bus 0, SCSI id 5)
Spare Disk(s) used by lun 8:
Disk114: Box 1, Bay 14, (SCSI bus 1, SCSI id 8)
Logical Volume Raid Level: MIRROR FAULT TOLERANCE (Raid 1)
stripe_size=32kB
Logical Volume Capacity : 216,634MB

CLI> show this_controller
Controller:
MSA1000(c) Compaq P56350GX3RJ0GF Software 4.48 Build 342 Hardware 7
Controller Identifier:
NODE_ID = 500805F3-00153820
SCSI_VERSION = SCSI-3
Supported Redundancy Mode: Active/Standby
Current Redundancy Mode: Active/Standby
Current Role: Standby
Device Port SCSI address 7
Terminal speed for the CLI is set to 19200.
Host Port_1:
REPORTED PORT_ID 500805F3-00153829
PORT_1_TOPOLOGY = F_Port
Cache:
256 megabyte read cache 256 megabyte write cache Version 2
Cache is GOOD, and Cache is enabled.
Unflushed data in cache
Battery:
Module #1 is fully charged and turned on.
Module #2 is fully charged and turned on.
Controller Up Time:
24 Days 16 Hours 24 Minutes 38 Seconds

CLI> show other_controller
Controller:
MSA1000(c) Compaq P56350GX3RG09W Software 4.48 Build 342 Hardware 7
Controller Identifier:
NODE_ID = 500805F3-00153820
SCSI_VERSION = SCSI-3
Supported Redundancy Mode: Active/Standby
Current Redundancy Mode: Active/Standby
Current Role: Active
Device Port SCSI address 6
Terminal speed for the CLI is set to 19200.
Host Port_1:
REPORTED PORT_ID 500805F3-00153821
PORT_1_TOPOLOGY = Arbitrated Loop
Cache:
256 megabyte read cache 256 megabyte write cache Version 2
Cache is GOOD, and Cache is enabled.
Unflushed data in cache
Battery:
Module #1 is fully charged and turned on.
Module #2 is fully charged and turned on.
Controller Up Time:
26 Days 20 Hours 51 Minutes 48 Seconds

CLI>
17 REPLIES 17
IBaltay
Honored Contributor

Re: Need help with MSA1000

Hi,
firmware 4.48 is active/passive, that means that only one controller here sees the logical drives and the other is always only standby. Thats why the main concern of the 2nd controller behaviour is that your solution is not hi av currently (because you have no redundancy on the controller level).
Maybe an additional workaround should be made also on SAM level:
a) san switch firmware compatibility with 4.48
b) the MSA controller 1 fabric port health (growing error counters), marginal link symptoms (cable, sfps, switch mainboard board, ...) which could potentialy cause the offline of the controller ...
the pain is one part of the reality
John Kufrovich
Honored Contributor

Re: Need help with MSA1000

Can you recheck your fibre connections.

I'm only seeing three fibre connections in this print out. Where is the fourth?

Are you using 2 single ported HBA per server?
If yes, can provide the pci slot number per each node.

There is one other potential problem, MSA C1 is logging in as Loop device. MSA C2 (controller 2) is logging in as a Fibre device.

Lets work on the first issue at hand and find the fouth connection.

jk

AFyodorov
Advisor

Re: Need help with MSA1000

Thank you. I noticed the missing connection too.

This may be an HBA that we replaced under warranty back in April. Maybe the new one is flaking out for some reason.

I am not physically at this location, but local IT already checked the cables and connections.

The HBA cards are single port cards. Each cluster node has two HBAs. One HBA goes to C1 and the second HBA goes to C2.

We use HP Secure Path workgroup edition on the cluster nodes.

I also noticed that one controller was F_Port and the other Arbitrated Loop, but couldn't figure out whether this was good or bad.
John Kufrovich
Honored Contributor

Re: Need help with MSA1000

What the connection name means that the device never logged into the MSA controller.

Is there going through a switch or direct attach using hub2/3?

If switch, you may want to check to see if you have zone.

The missing connection should be attached to MSA controller 2(left).
AFyodorov
Advisor

Re: Need help with MSA1000

I am pretty sure all the HBAs are attached, just one of them is not showing.

When I go to the System Managegement Home page, I see one Controller active and one controller Offline. Both HBAs connected to the offline controller show Loop Failed, I guess that is expected if the controller is offline.

I know for sure that the controller goes offline as soon as one of the cluster nodes boots into Windows [2003 server]. This is often a pretty harsh experience, this is an Exchange 2003 cluster, so while the controllers are "changing guards" Exchange is screaming bloody murder.

This MSA1000 uses two 2-8 fibre switches in the back.
Prashant (I am Back)
Honored Contributor

Re: Need help with MSA1000

Few changes :

1) Profile Name = Default
->make it to windows.

2)PORT_1_TOPOLOGY = Arbitrated Loop
Cache:
and
PORT_1_TOPOLOGY = F_Port
-> There is some connection problem or we are using switch and also hub ??? No recommended to use both at same time.

3) hope you are running a supported storeportdriver with its fix also.

Prashant S.
Nothing is impossible
AFyodorov
Advisor

Re: Need help with MSA1000

just two switches, no hubs.
IBaltay
Honored Contributor

Re: Need help with MSA1000

hi,
can you list the switchshow of the both embedded fabric switches pls?
the pain is one part of the reality
AFyodorov
Advisor

Re: Need help with MSA1000

sorry... is this command available in CLI?

I tried switchshow and got

CLI> switchshow
Invalid CLI command.



I am only seeing these commands in help

CLI> help

Possible command verbs:
? help add
copy change download
delete migrate expand
extend accept rename
set disable locate
show start stop
clear override

Possible command nouns:
unit connection acl
profile mode spare
firmware units unit_id
this_controller this_controller_id other_controller
other_controller_id globals prompt
this_controller_hard_addressstandby disk
bus box all
cancel connections version
disks tech_support perf
cacheinfo taskstats debug
eventlog aculock
IBaltay
Honored Contributor

Re: Need help with MSA1000

you need to run it from the switch os
the pain is one part of the reality
AFyodorov
Advisor

Re: Need help with MSA1000

thanks. That's something I haven't mastered yet :)

How can I do this?
IBaltay
Honored Contributor

Re: Need help with MSA1000

if you have the lan connection to the switch you should telnet/authentify (default is admin/password) and then run the switchshow
the pain is one part of the reality
IBaltay
Honored Contributor

Re: Need help with MSA1000

AFyodorov
Advisor

Re: Need help with MSA1000

wow, that sounds very close, many thanks!!!

By the way, those may be hubs that we are dealing with.

I haven't been onsite myself, and local IT may have traded MSA1000 parts with another cluster which needed more ports and ours ended up with the 2-3 hubs.
IBaltay
Honored Contributor

Re: Need help with MSA1000

but one port shows as a fabric switch (F-port) and the other as a hub port (arbitrated loop FL port)
the pain is one part of the reality
AFyodorov
Advisor

Re: Need help with MSA1000

could it be due to jumpers not being set the same on the two HBAs?

The whole system worked fine for a long time.

Then we had a failed HBA that was replaced under warranty.

The MSA Controller 1 started getting knocked into offline after the replacement HBA was installed.

Maybe the replacement HBA needs the jumper adjusted?
Greybeard
Esteemed Contributor

Re: Need help with MSA1000

Hello there, I have seen similar behaviour with newer HBAs and it was because on the earlier ones there was a jumber to disable the laser test pulse on startup which was seen to cause a controller at the other end of the connection to hang on some MSAs, this jumper is not on the newer cards and I have yet to find it in the HBAs BIOS. It can be tested by starting the system with the suspect HBAs fibre unplugged then connecting after POST, if this works then it may be the cause of your issue. There may be a firmware or driver update to address this now, are all your drivers and MSA / Host firmware, drivers up to date?
_________________________________________________
How to assign points on this new forum? Click the Kudos Star!