Ralf Gerresheim
Frequent Advisor

I have a question concerning the optimal configuration for installing Managers on a system:

Supposed I have a Multi-Site SAN system with 4 nodes, split into two DC. On each node I have the Manager running. Additionally, I have a Fail-Over Manager running in a (virtual) third side. Volumes are created in a NRAID 10+2 configuration.

So, in total I have 5 Managers running, quorum is 3.

For a 10+2 configuration, it is said that we have site protection and fault tolerance in the remaining site, which means that from the two nodes in the remaining side one additional node could fail, while I'm having still access to my data.

So, if one of the the sites goes down, I lost 2 Managers. No problem, quorum is still fulfilled.

If now in the remaining site another node goes down, I lose another Manager, only two managers remain, no quorum => access to data is lost!

So how is "fault tolerance in the remaining site" given?

Re: Question understanding optimal configuration for Managers

Hi Ralf,


When you run just 1 manager on each site, you don't have this problem. Because then your quorum is 2. When a node without a manager fails this doesn't affect the quorum. So 1 site can fail completly and the node without the manager on the remaining site can fail and you still have a quorum of 2.



Joris Vliegen


Ralf Gerresheim
Frequent Advisor

Re: Question understanding optimal configuration for Managers

Hi Joris,


thanks for your suggestions.

I tested this configuration, but doesn't work completely:

 If I lose one DC: no problem.

If I lose the storage node in the remaining DC that doesn't have the manager installed, I have access to the last node.

But, if I lose the storage node that has the manager installed, CMC tell me, that I will lose the quorum and access to the data.

So, I made test: After the DC1 is going down, I started the manager on the second storage node in the remaining DC.

After this, I thought, I can lose any of the remaining two nodes. But, CMC now tells me that I will lose the quorum, independently which node I should lose (BTW: I do this tests in a virtual environment with 4 VSA installed and 'lose' means that I shutdown the nodes either via CMC or vCenter). If I stop the manager again, I can lose that node and still having access to the data via the node that originally has the manager running.

I can't explain that behavior.


Can you explain that behavior?


Honored Contributor

Re: Question understanding optimal configuration for Managers

you will never be able to maintain quorum across two sites when the site with the tie-breaker goes down.  This will always require some sort of manual failover.


The only way I could think of getting around this is somehow using another independent HA solution to host the FOM so that the FOM seamlessly fails over to the 2nd site when the first one goes down.  Maybe there is an option w/ VMware's FT or maybe something with a new windows server2012 shared nothing HA solution. 



The problem with a pure two site solution and automation is that how do you really know the primary site is actually down and not that you just lost communcation with it.  Split-brain is a real issue and only really dealt with correctly if you have a 3rd site for the FOM.