StoreVirtual Storage
1820658 Members
2449 Online
109626 Solutions
New Discussion

Re: how to maintain quorum in a single site cluster with 4 nodes?

 
lex_11
Occasional Advisor

how to maintain quorum in a single site cluster with 4 nodes?

Hi, 

 

I want to have a single site cluster of 4 nodes with storage redundancy build up to tolerate the loss of 2 nodes without impact to data access and the data consistency itself. For this purpose all volumes in the cluster are going to be fully provisioned and protected with network RAID10. In the cluster there will be 3 mangers running.

According to this, I would have the quorum in the cluster and my access to data would be granted.

 

P4500 Cluster:

Node1 Manager

Node2 Manager

Node3 Manger

Node4 –

 

..So far so good

What bothers me, is that if I lose the Node1 with the manager running on it, I will have no quorum and would be unprotected in case that the Node3 goes offline. I would have a split brain in the Cluster and would have no quorum for this time unless I start the manger on the Node4. Is it possible to trigger the start of the stopped manager on the Node4 automatically, as soon as one of the running managers stops?

 

Thanks all!!

18 REPLIES 18
Bryan McMullan
Trusted Contributor

Re: how to maintain quorum in a single site cluster with 4 nodes?

You can either install a FOM to help with the quorum (and have all nodes running managers so you have the suggested 5 managers in the cluster), or you could get another node and run managers on them. 

 

As you seem to be set, I think running managers on all 4 nodes and adding a FOM  Even though you're not running multi-site, I think it should work fine.

oikjn
Honored Contributor

Re: how to maintain quorum in a single site cluster with 4 nodes?

I haven't tried, but I would imagine you could get creative with the CLI and batch commands to get what you are looking for.

 

In the alternative, do you have a problem with adding a FOM to the mix?  Then you use all four nodes with the manager and a FOM to keep quorum and can maintain a two-node failure.  Hopefully you are using raid10+1 because it really doesn't matter if you keep quorum w/ a 2nd node failure if you only are using raid10 since the loss of the 2nd node will stop LUN availability anyway.

KFM_1
Advisor

Re: how to maintain quorum in a single site cluster with 4 nodes?

You know what, I'm not entirely convinced that in your situation, with the loss of one node running a manager, that you will be in a split-brain situation.

 

I've attached a screenshot from the HP StorageWorks P4000 SAN Solution User Guide, on page 149, table 35 Managers and quorum.

 

Managers.png

 

You will see that when you run three managers, and one fails, it says "If one manager fails, 2 remain, so there is still a quorum."  Therefore this means we are NOT in a split-brain scenario.  What I assume from this statement is that the determination of whether or not a cluster is in a split-brain scenario is calculated using the pre-failure number of managers, that is, three.  Thus two out of three managers is still a majority (though not recommended as it's not fault tolerant).

 

This to me is misleading as the paragraph directly above the table says "An even number of managers can get into a state where no majority exists—one-half of the managers do not agree with the other one-half. This state, known as a 'split-brain,' may cause the management group to become unavailable."

 

To me, the guide does not go into enough detail with regards to managers, quorums and special managers.  The guide mentions that the FOM should be used for specific scenarios yet I've heard from authoritative sources within HP that it should be used pretty much in every scenario.  IMO HP use it as a silver bullet for all possible split-brain/manager scenarios - "oh just deploy a FOM and all will be good".  If that's the case then I cannot imagine a scenario where you wouldn't want a FOM!

 

Bart_Heungens
Honored Contributor

Re: how to maintain quorum in a single site cluster with 4 nodes?

Hi,

 

HP is quite clear about the usage of the FOM...

If you have 2 nodes in 1 server room or 2 times 2 being 4 nodes in a cluster spread accros 2 server rooms you should have a FOM to avoid split brain situations...

Split brain happens also with only 2 nodes in 1 server room where they loose communications between the 2 of them... Which node should stay active...

 

Best practices say that you need 3 or 5 managers to avoid downtime...

 

If you have 4 nofdes and you start 3 managers, as such you have a good situation since 1 can go down and U keep quorum... But it is not ideal, and that is also what the BPA says inside the CMC... You should go for 5...

 

 

Kr,

Bart

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star" !
KFM_1
Advisor

Re: how to maintain quorum in a single site cluster with 4 nodes?

Hi Bart,

So HP are essentially saying to use a FOM in all scenarios to make up one of the managers ;)
Bart_Heungens
Honored Contributor

Re: how to maintain quorum in a single site cluster with 4 nodes?

Hi,

 

That is the 1 and only reason of the FOM being there, being a manager... And it is doing it quite well...

 

It helped me already in several cases...

 

 

Kr,

Bart

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star" !
KFM_1
Advisor

Re: how to maintain quorum in a single site cluster with 4 nodes?

Hi Bart,

I don't doubt it's doing a good job being a manager! I was wondering why HP don't just say to use it in all scenarios....single site (single rack/multi-rack/multi-room/etc), multi-site and in situations where there are only two nodes in a cluster.
Bart_Heungens
Honored Contributor

Re: how to maintain quorum in a single site cluster with 4 nodes?

Hi,

 

That is what HP says: with only 2 nodes it is always better to have a FOM...

 

 

Kr,

Bart

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star" !
lex_11
Occasional Advisor

Re: how to maintain quorum in a single site cluster with 4 nodes?

Hi, thanks for all the replays!

 

I know that HP recommends the FOM as a best practice. If you have an even number of manager, then the FOM is needed to get the odd number. I have also read the SAN installation guide, the problem is that there is no detailed explanation how this exactly work. It says that it is recommended to have 3 or 5. Even if I would have an FOM and 4 manager running, at  the moment when one of the nodes goes offline, I would be left with 3 managers +FOM=Even number.

So what is the guaranty that there still would be a quorum in a case that one of the other nodes goes offline as well?

At the end I guess I could test this before going online, but I'm still qurios about how the quorum is maintaiend..

 

What would happen if I would build a logical “Multi Site cluster” but physically on the same location with 4 managers+ FOM outside of the “site” would there be any difference?

 

Cheers

KFM_1
Advisor

Re: how to maintain quorum in a single site cluster with 4 nodes?

I know that HP recommends the FOM as a best practice. If you have an even number of manager, then the FOM is needed to get the odd number. I have also read the SAN installation guide, the problem is that there is no detailed explanation how this exactly work. It says that it is recommended to have 3 or 5. Even if I would have an FOM and 4 manager running, at  the moment when one of the nodes goes offline, I would be left with 3 managers +FOM=Even number.

 


I too am curious about how the management group determines quorum, and I don't mean by a simple odd/even number!

 

If the FOM is a best practice (not that I've read that explicitly) then why don't they just say to use it in all scenarios rather than just the two that are mentioned in the use guide?  Given the calculation of quorum I can't think of a scenario where you wouldn't use a FOM.  That is my main gripe with the documentation.

 

What would happen if I would build a logical “Multi Site cluster” but physically on the same location with 4 managers+ FOM outside of the “site” would there be any difference?

 

 

I'm guessing no difference.  I have built something similar (yet opposite) by stretching a single-site cluster

across two physical sites.  I had eight nodes, four at each site.  Of these four, two were running managers and I had a FOM at a third logical site so I had five managers for quorum.

 

The only difference I've found with multi-site clusters is the requirement for different subnets for each site, thus two or more VIPs.  That in itself shouldn't affect quorum calculations.

 

Although I'm always happy to be proved wrong! :)

 

lex_11
Occasional Advisor

Re: how to maintain quorum in a single site cluster with 4 nodes?

which network reaid level did you configured for the volumes? Have you ever tested the failover function in your configuration?
Bart_Heungens
Honored Contributor

Re: how to maintain quorum in a single site cluster with 4 nodes?

Well some cases where you don't need really a FOM:

 

Single server room 4 nodes or more... At that moment 3 managers will give enough managers since a split brain situation is not really possible since I assume all nodes are connected to the same switches... Majority of nodes is only necessary at that moment when U will restart nodes when updating firmware...

 

The only thing a multi site cluster will do is spread the blocks of data accros the nodes so that every block of data is located in every site. This to avoid that, when 1 site goes down, all volumes would go down...

You can obtain this also by creating a single site cluster and arrange the nodes on cluster level that the odd nodes are in 1 site and the even nodes are in the second site...

A picture explains better and can be found in the training material (I am a P4000 certified instructor)... But there is a logic behind...

 

Know that from version 9.0 on, multiple subnets are not necessary anymore for a multi site cluster... I have them set up all the time in a single subnet, works great...

 

Kr,

Bart

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star" !
lex_11
Occasional Advisor

Re: how to maintain quorum in a single site cluster with 4 nodes?

 



Well some cases where you don't need really a FOM:

Single server room 4 nodes or more... At that moment 3 managers will give enough managers since a split brain situation is not really possible since I assume all nodes are connected to the same switches... Majority of nodes is only necessary at that moment when U will restart nodes when updating firmware...


with 3 managers I would have the quorum in the cluster. But, I still don't get it how the quorum would be maintained in a situation after one of the 3 running managers would go offline? During the time the one manager is offline I would have a split brain in the cluster... I'm thinking about the worst case scenario and assuming that another manager could go offline shortly after the first one went offline.

 

I guess I will have to try the solution with 4 managers+FOM, though there is still some unclearness about the quorum maintenance..

 

 

 

Bart_Heungens
Honored Contributor

Re: how to maintain quorum in a single site cluster with 4 nodes?

Hi,

 

U mention it yourself, with 3 managers U can have quorum with 2 surviving managers but at that moment nothing else should go wrong... That is why HP (and also myself) always go for 5 managers if U have the possibility... All my customers with 4 nodes have the FOM installed, even if all nodes are in 1 datacenter...

 

But I discuss this always with the customer and discuss with him all ppssible scenarios... It's up to him to decide...

 

 

Kr,

Bart

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star" !
lex_11
Occasional Advisor

Re: how to maintain quorum in a single site cluster with 4 nodes?

Hi  Bart,

 

the other option I'm also thinking about is to go for 2 cluster in a single site, with all mangers running + FOM. Though I would have the half of the performance than in a single cluster,  I would get the benefit from the storage redundancy and could tolerate the failure of 2 nodes (1 from each cluster).

 

One thing I'm not quite sure is how many many FOM do I need to have in this scenario? Is one FOM enough for 2 Clusters, or do I need 2?

 

Cheers!

 

 

 

 

KFM_1
Advisor

Re: how to maintain quorum in a single site cluster with 4 nodes?


@lex_11 wrote:
which network reaid level did you configured for the volumes? Have you ever tested the failover function in your configuration?

Sorry for late reply!

 

I used network-raid10+2, so 4 copies of the data, 2 in each site.  Yes we did test failover functionality before going live - this was a customer requirement.  Data integrity and access passed with flying colours.  In this case, we did have a FOM in a different server room that had network connectivity to both physical sites.

Bart_Heungens
Honored Contributor

Re: how to maintain quorum in a single site cluster with 4 nodes?

Hi,

 

You add a FOM per management group and not per cluster...

So for that U don't need to create 2 clusters... A good reason why to create multiple clusters is to split up types of disks for instance SATA, SAS and SSD... Another reason can be for remote copy groups to renote sites with asynchronous replication...

 

 

Kr,

Bart

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star" !
David_Tocker
Regular Advisor

Re: how to maintain quorum in a single site cluster with 4 nodes?

My understanding is that a FOM is there to serve the purpose of maintaining quorum in the case of lost nodes, or serving the purpose of a 'tie-breaker' in the case of having an even number of nodes.

 

So if you had a two-room scenario with two nodes in each room you would want a FOM on a seperate network that is accessable from any node in the case of a failure. Meaning that realistically you want to have L3 switches talking to a seperate switch on a seperate network with the FOM attached. The normal rules of TCP/IP networks apply, so routes to the seperate network need to be maintained on each rooms switch/router for reliable operation.

 

This can be as simple as (room1 (192.168.1.x) ---- FOM (192.168.2.s) ---- (room2 192.168.3.x))

This way you are covered from a room failure and a switch failure, but you still cannot loose the FOM if (only) one of the rooms is down. In the case of total failure of all rooms, you want the FOM to be the last to fail, at least that way it can perform the duty of a tie-breaker. In the case of the FOM going off before both rooms, i assume that it will still be able to perform this task when coming back online, but I am not 100% on that.

 

NEVER run the FOM from one of the nodes - if the FOM cannot come up independantly of the cluster then you could be in trouble.

Regards.

David Tocker