Array controllers active/standby and volumes served

Fred Blum · ‎06-03-2020

Hi all,

We are new to the Nimble platform as we moved in the past from HP Lefthand to SV3200 and now to Nimble HF20C in test. We were using sync repl with the Lefthand and SV3200. We are waiting for the NimbleOS 5.1 with sync repl to become GA or whitelisted for our situation.

I have a question regarding the performance of the Nimble system and the Active/Passive design of the Controllers. Before with the Lefthand (all 4 nodes), SV3200 both nodes x 2 controllers were actively serving exported volumes in a sync repl situation. The Lefthand used a VIP and active load balancing. The SV3200 as best practise was to have as many volumes as controllers or in a multiple of the number of controllers to have each controllers actively serve the same number of volume. Not really load balancing imho, no DSM and we are experiencing other problems.

The Nimble seems to have a active/passive design so the passive controller does not seem to be involved in balancing the workload. In sync replication not the full performance potential is used as one controller remains idle. Only standby in case the active controller runs into a problem. Am I correct in this?

Regards,

Fred

Nick_Dyer · ‎06-03-2020

Hello,

In any storage platform, it's crucial to architect for failures. With a typical active/active dual controller system - you can utilise the entire performance of the array, but the best practice for withstanding failures and upgrades is to reserve 50% headroom in those controllers - in case something went wrong. So even though running active/active, you're only utilising 50% of the working set of the array resources.

In reality, this often isn't the case - and controllers run 60%/70% (for example) because there's no easy way to manage this kind of balancing - and when an upgrade (or failure) happens, very bad things happen to your application performance, and also can have a knock-on effect to data availability or data services.

With Nimble, our mantra is data integrity & data protection first and foremost. Because of that, Nimble runs 100% active on the active controller, and should anything go wrong - the standby controller can jump in and take 100% of the application IO, without any performance penality or data integrity issues. We're still using 50% of the total resources of the array, but do not trade off potential app degregation, or worse still data corruption or integrity problems to do so. It also means that you can do live firmware upgrades of the platform without impacting host or application performance.

This is why Nimble's trusted by some of the largest business & mission critical platforms and workloads worldwide.

We still do use a virtual IP address schema with our own DSM modules and additional intelligence, and will auto-manage the IP connectivity across both controllers for you. That still takes place, and is very slick in it's operation.

If you're wanting NimbleOS 5.1 - you can speak with support to be whitelisted for the code now. If you're wanting Peer Persistence (that is automatic failover with synchronous replication) that is in NimbleOS 5.2 - which should be available later this week.

Nick Dyer
twitter: @nick_dyer_

Fred Blum · ‎06-03-2020

Hi Nick,

Thanks for the design principles of the Nimble philosophy.

Regarding sync repl AKA peer persistence that was communicated for Q1 2019 by HPE to us and also in the 5.1 as per your post: https://community.hpe.com/t5/hpe-storage-tech-insiders/nimbleos-5-1-intro-to-peer-persistence-part-1/ba-p/7044271#.Xte1-FUzaJA

Sofar we are waiting. Support refers us to our sales manager for whitelisting who is still in limbo with his backoffice and this is dragging on indefintely.

What is wrong with this sync repl that it is now moved to 5.2? According to support the 5.2 is far from being GA.

Nick_Dyer · ‎06-03-2020

We've made a series of critical enhancements to ASO which are in NimbleOS 5.2 - and any customer wanting Peer Persistence should use 5.2 to benefit from those enhancements. It'll be available under IPR in the next few days.

You don't need to speak to sales for whitelisting - it doesn't have anything to do with sales. This is done by yourself directly with Nimble Support and Infosight. If you need it, please do speak with them.

Nick Dyer
twitter: @nick_dyer_

Fred Blum · ‎06-04-2020

Hi Nick,

I will try again to be white listed with support once the release 5.2 becomes available.

Can you say something regarding Array's active/passive design in a peer persistence situation? Or does the Nimble design philosophy mean ArrayA - Active (ControllerA Active/ControllerB standby ) - ArrayB Passive (Controller A Active/ControllerB Standby).

With one Volume Collection on Leader ArrayA hat would mean only 25% of the total capacity is used.

So essentially a minimum of two Volume Collections should be created one on with ArrayA as leader (ArrayB repl partner) and one with ArrayB as leader (ArrayA repl partner) to achieve the 50%.

Am I correct?

Can existing volumes be added to a newly created peer persistent Volume Collection?

TIA,

Fred

Nick_Dyer · ‎06-04-2020

Hello,

In a Peer Persistence configuration you can have active volumes on both arrays, with each array being a downstream "destination" partner for other volumes. So yes, 50% of your resources can absolutely be consumed on each platform, whilst still protecting you for both local & site failures.

For example, in this design should you have a local failure on a controller, the local standby controller can interject within seconds and take over the active workload without failing over to other site - AND not suffer any performance loss whilst doing so. Active/active array configs would have a huge performance hit should this happen.

Nick Dyer
twitter: @nick_dyer_

Fred Blum · ‎06-06-2020

Hi Nick,

You are right in saying that in active/active design performance will take a potential hit when a controller or node is down. IMHO the performance guarantee of an active/standy design also comes with a price tag of only using half the potential performance capacity. In my experience well designed systems are never working continously at 100%, so even when a controller or node goes down, the performance hit is seldom noticed by users.

just my 0,02,

Fred

Nick_Dyer · ‎06-06-2020

Hi Fred,

Unfortunately that is quite often not the case - history is littered with high profile outages of storage platforms that were over-resourced and not managed correctly - and the finger is often pointed at controller resource management.

And in a well designed system with active/active, no more than 50% of the total resources of the array should be consumed across both controllers.

So whether you split that balance 50/50 (a/a), or split it 100/0 (a/s) - one should always reserve resources in the array for failures. The additional downside of active/active is it's almost impossible to manage this balance by an end user admin - EMC and Netapp being two prime examples of it.

Nick Dyer
twitter: @nick_dyer_

Sheldon Smith · ‎06-06-2020

Too true. Over the years I've seen quite a few EVA and 3PAR customers that started with a well-designed system. Over time they kept adding more and more to where even with all controllers up there were performance issues. And when a controller went offline there were problems.

It's the nature of Management to have the equipment do more and more ("There's still disk space--use it") until Operations says "that's enough".
All too often they don't.

Note: While I am an HPE Employee, all of my comments (whether noted or not), are my own and are not any official representation of the company

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Array controllers active/standby and volumes served

Array controllers active/standby and volumes served

Re: Array controllers active/standby and volumes served

Re: Array controllers active/standby and volumes served

Re: Array controllers active/standby and volumes served

Re: Array controllers active/standby and volumes served

Re: Array controllers active/standby and volumes served

Re: Array controllers active/standby and volumes served

Re: Array controllers active/standby and volumes served

Re: Array controllers active/standby and volumes served