Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

MSA1500 redundancy

SOLVED
Go to solution
James Mercer
Advisor

MSA1500 redundancy

Hi,

We have a dual controller MSA1500cs. Each controller has a fibre channel I/O connector behind it. Each FC I/O is connected to one san switch (FC 8 port 4Gb/s from HP). There is no zoning on the switch.

We've then connected servers with single FC connections to the switch.

The MSA1500 reports that the controllers are in Active/Standby mode. The right (facing front of MSA) controller is showing active and the left controller is showing standby.

If we disconnect the fibre channel link to the primary controller, we lose ALL communication with the MSA - no disk/lun access and no ACU access despite the fact that we still have one working FC connection to the MSA and the FC connections to each server. We've swapped controllers, FC cables, FC I/O cards around on the MSA.

I've logged this with HP and they initially changed the chassis of the MSA. However, we still have this problem. Our firmware was 5.02 and we tried 5.10, but nothing has changed/improved.

Does the second FC connector work at all in these things? Is the redundant controller able to do anything? Should we move our second FC I/O connector to be behind the active controller?

Thanks
20 REPLIES
Torsten.
Acclaimed Contributor

Re: MSA1500 redundancy

Do you have path management software like HP Secure Path installed?

Quote: "NOTE: Redundant configurations require two Host Bus Adapters per server, an additional controller and fibre channel I/O module, redundant switches and cables, and Secure Path (active passive configuration) software or Industry Standard failover depending on the operating system for each server. "

source: http://h18000.www1.hp.com/products/quickspecs/11945_div/11945_div.html

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
James Mercer
Advisor

Re: MSA1500 redundancy

Thanks - I've been reading that passage many times today. We realized that redundant paths require redundancy throughout, but we expected the MSA to provide redundancy within.

In otherwords, regardless of whether we had single or multi-pathed hosts, we expected the MSA to be able to fail over to the other controller. That way if one controller failed the other would take over. I'm (almost) certain this is the case in large HP SANs/controllers. It appears that HP relies on external software and paths for the MSA instead of internal busses and failover. Is that true?

John Kufrovich
Honored Contributor

Re: MSA1500 redundancy

James
Torsten is correct. You need either Secure Path or MPIO for Active/Standby.

We recently released A/A FW for the 1500. Ver 6.86. You will need to download the MSA Support Cd 7.57. It will have all the necessary items. Also, Secure Path isn't supported but Full feature MPIO is.

jk
James Mercer
Advisor

Re: MSA1500 redundancy

Ok, so redundancy 'within' the controller is only available using SecurePath or MPIO. Yes/no?

We have a mixed environment including Netware. This seems to prevent us from using Active/Active configurations. (HP recommends that we use firmware 4.98 or lower, but only supplies 5.02 and higher.) Is there anywhere I can get appropriate firmware for use with Netware?

I understand MPIO basic is free. Is MPIO Full also free (only usable with Active/Active firmware)?
John Kufrovich
Honored Contributor

Re: MSA1500 redundancy

Yes,

Basic and Full feature MPIO are free.

Let me ping a couple of individuals to find out what the status of Netware MSA FW support.
Torsten.
Acclaimed Contributor

Re: MSA1500 redundancy

generally speaking, if an array has dual controllers, the controllers *must* cummunicate with each other to prevent things like a "split brain". This is true for *all* arrays. Some arrays have active/passive controllers (passive become active, if the active controller fails), some have active/active. Most OS needs special software to be able to talk to an array with dual pathes. In case of active/passive, the OS has to switch to the redundant path. In case of active/active, the OS has to know, that the "2 different" devices seen are in fact only one with dual pathes. A nice feature in this case can be load balancing.
In fact, if your active controller fails, the passive should become active.

If you only disconnect the fibre, the controller is still working, no reason to fail-over from his point of view.

Now your host have to switch to the other path and force the controllers to switch.

This is the concept in general.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
James Mercer
Advisor

Re: MSA1500 redundancy

John - if you can find firmware 4.96 that would be great. We have this on one of our MSAs and it seems to be reporting everything is ok. (In Insight manager it is reporting both controllers are ok, while the MSA with 5.10 shows one controller as red in Insight Manager.)

Terton - Thanks for the details. Yes, if the FC is removed from the primary/active controller fail over wouldn't happen without path software.
However, we have also tested this by removing the active controller. The standby controller tries to become active and then reports "43 Redundancy failed hardware failure". We were trying to simulate loss of the active controller. This test is not supported by HP; is it a valid test of redundancy? Is there another way to test the controller redundancy? Despite the chassis change, the test still fails and the standby controller will not become active.
Do the controllers fail over to each other at all without external software?

Thanks
James Mercer
Advisor

Re: MSA1500 redundancy

Sorry Torsten, I mis-spelled your name in my reply. Many apologies...
Torsten.
Acclaimed Contributor

Re: MSA1500 redundancy

IMHO the removal of the fibre cable is not a reason for the controllers to switch. From their point of view everything is fine. They can talk to each other and are able to see all disks.

As you mentioned, to force the fail-over is part of the software on the hosts. You will need the software, there is no way around.

Even if you have an OS that supports and expect dual active/active pathes, you need special software for active/passiv arrays, because every access to the passive ctrl would cause the controllers to switch. This would result in a endless fail-over from one controller to another.

Your test is realistic, but it is not recommended, because you decrease the redundancy without emergency and you can damage the parts by accident. You do this at your own risk.

But you have seen that the fail-over is working ;-)

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
James Mercer
Advisor

Re: MSA1500 redundancy

Torsten - the fail-over test failed. When we remove the active controller, the standby controller tries to become active and then fails with the error:

"43 Redundancy failed hardware failure"
From our point of view, the array is now 'dead'. Software and disk access no longer communicate with the MSA controller.

Shouldn't the standby controller become active and indicate that it is ready for use?

Stephen Kebbell
Honored Contributor

Re: MSA1500 redundancy

Hi,

hang on, do you only have one SAN switch ? If so, your configuration is not a supported one. The rule for the MSA is either single everything or dual everything (HBAs, Switches, Controllers). Nothing in between is supported (even if it does seem to work). Also, you mention you have a mixed environment - I assume this means different OSes. This usually means you NEED zoning on the switch(es). You may run into problems if you do not keep different operating systems in separate zones.

To test controller failover, you could connect the serial cable to the front of the active controller and use the CLI to disable the controller. The command is:

disable this_controller

From the user guide: "In a dual-controller system, this command disables one of the controllers to prepare it for removal. When a controller is disabled, all resources being processed by that controller are automatically failed over to the remaining controller. After a controller has been successfully disabled, the LCD panel displays a message stating that it is safe for that controller to be removed."
The CLI User Guide can be found here:
http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00683579/c00683579.pdf

Note: I hope this command is supported in your firmware version. I can't say for sure.

Regards,
Stephen
Torsten.
Acclaimed Contributor

Re: MSA1500 redundancy

The message "43 Redundancy failed hardware failure" is telling you, there is no longer Redundancy, because one controller has failed. Again, if you need to have the fail over software on your host and you don't have it installed, your host will still try to access the failed controller. Use the front panel to get the current status. BTW, I don't know your whole configuration, thats why I can only answer in general.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
James Mercer
Advisor

Re: MSA1500 redundancy

Right Torsten, I understand your point - you're saying the message is merely a warning and not an indication that the secondary controller has failed as well. Since we lose all communication when the primary is lost, we assumed the message was reflecting our problem and not stating something else.

Our firmware (5.02 and 5.10) only allows the standby controller to be disabled.

Fair point on the redundancy supported by HP: all single path or all redundant. In the past with our HP virtual arrays, we've used a mix redundancy situation - some servers are single path and some are dual path, but the controllers are always dual path. I think we expected the same of the MSA even though all of our servers are likely to be single pathed - we still expected the MSA 'backplane' to handle communication between the FC I/O cards and each controller.

Is it true then that: MSA FC I/O cards only communicate with one controller and any use of the standby controller must use multi-path software?

I've tried installing MPIO on one server, but it fails presumably because it has only one FC HBA.

Any other thoughts welcome, but I think it is beginning to be very clear.

Thanks
Torsten.
Acclaimed Contributor

Re: MSA1500 redundancy

Virtual arrays - VA or EVA? The VA is totally different.

You should now read the documentation carefully and make sure you meet all requirements regarding number and model of HBAs, drivers, SAN connection/cabling, software and so on. In case of doubt ask here or open a call with HP.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Uwe Zessin
Honored Contributor
Solution

Re: MSA1500 redundancy

> Is it true then that: MSA FC I/O cards only communicate with one controller and any use of the standby controller must use multi-path software?

Absolutely. A host's FC port is only allowed to have access to one of the MSA's controllers and vice versa: an MSA controller must not be accessed by more than one FC host port.

It is not written that way in the documentation, but if you scroll back some time in ITRC's history, you will see quite a number of failures due to breaking these rules.
.
James Mercer
Advisor

Re: MSA1500 redundancy

Torsten - yes, we've been re-reading the documentation very carefully over the last few days. There are some known issues like the firmware version and Netware, but the only thing that wasn't working as expected was the 'failover'. In that regard the documentation was clear about what was supported, but not about what worked. I think you and the others have answered that - the MSA controllers only establish one connection and there is no back plane connection between controllers.

Your comment on VA vs EVA being very different - can you explain? In general I know the differences regarding continuous access (replication), sizes, speeds, and product lifetime (EVA is the future), but I take it you were talking about the controller redundancy behavior. Can you tell me more about how the VA and EVA differ in redundancy methods? Thanks in advance.

Uwe - I think you 'hit the nail on the head'. Our assumption which wasn't directly contradicted in the documentation has lead to the whole problem. In hindsight, I would have rather spent my time in the forums than trying to 'fix' something that wasn't broken...
Uwe Zessin
Honored Contributor

Re: MSA1500 redundancy

The VA uses two disk pools (called Redundancy Groups). By default, all disk drives from odd slot numers are allocated to one RG and the even ones to the other. Each RG is assigned/maintained to a single controller. The storage ressource (LUN) is allocated from such a RG.

The EVA uses one or more (up to 16) disk groups. It usually selects the disks itself, but the user can explicitly group/ungroup individual disk drives. The storage ressource (virtual disk) allocates space from a single disk group, but the controller management can change. Both controllers can manage different virtual disks in the same group.


As far as I can tell, the VA has provided 'Active/Active' support from the very beginning and the EVA used the old DEC HSG mode: a single controller allows read/write access to a resource while the second controller maps SCSI LUNs as well, but I/Os cannot be satisfied through these paths.

This has changed with Active/Active firmware for the EVA since some time ago. However: both arrays do work the same way: a single controller does the actual disk I/O (on the VA it is called the 'performance path') for a storage ressource [*] and if you try to access the ressource through the other controller it has to redirect the I/O to the owning controller.

[*] Unlike the MSA controllers with Active/Passive firmware, the other EVA controller can do disk I/O for a different virtual disk, even if it's in the same disk group.
.
Torsten.
Acclaimed Contributor

Re: MSA1500 redundancy

regarding the connections:

The 2 controllers are connected to each other to allow communication, of course.

Every controller is connected to all the SCSI busses to be able to access every disk.

The fibre channel connections are dedicated to a single controller, first FC connection only for the first controller, second FC connection for the second controller only.

The main difference between active/passive (e.g. MSA1500 with "old" fw) and active/active (new EVA, VA) is recognizable by the name, I guess.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
James Mercer
Advisor

Re: MSA1500 redundancy

Thanks for the explanations - it was the dual controller, dual FC I/O yet only single paths that we hadn't understood. The information regarding the EVA and VA match what we thought. Our mistake was to think this also applied to the MSA.

I'll leave the thread open in case anyone wants to comment further, but you've cleared up the issue for me - there never was a problem with hardware - it was only with our expectations.

As for the firmware, since we're running Netware, I'll load the firmware that HP recommends; well, it recommends 4.98, but supplies 5.02...

Thanks again for all the help.
DDR_1
Occasional Visitor

Re: MSA1500 redundancy

I had the same issues - and after reviewing the quickspecs http://h18000.www1.hp.com/products/quickspecs/11945_div/11945_div.html it clearly states:

NOTE: Redundant configurations require two Host Bus Adapters per server, an additional controller and fibre channel I/O module, redundant switches and cables, and Secure Path (active passive configuration) software or Industry Standard failover depending on the operating system for each server.

I installed a second switch; updated msa firmware to 7.0 (for active/active) & drivers 7.67 with HP Full feature MPIO & everything now works fine.