Storage Boards Cleanup
To make it easier to find information about HPE Storage products and solutions, we are doing spring cleaning. This includes consolidation of some older boards, and a simpler structure that more accurately reflects how people use HPE Storage.
HPE StoreVirtual Storage / LeftHand
cancel
Showing results for 
Search instead for 
Did you mean: 

vSphere HA and "split brain"?

SOLVED
Go to solution
Paul Hutchings
Super Advisor

vSphere HA and "split brain"?

I'm struggling to find something which documents this scenario:

Site A
Server A
FOM
Storage Cluster Nodes (group IP 1.2.3.4)
|
Site B
Server B
Storage Cluster Nodes (group IP 1.2.3.4)

If the link dies all the kit is still up.

The FOM gives quorum to Site A and in Site B the storage goes offline.

What will vSphere HA do though? With only two servers can it be configured so that Site A takes quorum and starts the VM's that were running in Site B?
6 REPLIES
Paul Hutchings
Super Advisor

Re: vSphere HA and "split brain"?

Of course an added complication here is that from what I've been reading (P4000 Multi-Site HA/DR Solution Pack user guide) I suspect I should be using a different subnet in Site A and Site B so, for example:

Site A
Server A
FOM
Storage Cluster Nodes (group IP 192.168.0.1)
|
Site B
Server B
Storage Cluster Nodes (group IP 192.168.1.1)

Now it seems it gets really confusing as vSphere apparently can't do iSCSI multi-pathing to anything other than its local subnet, so whilst I believe I can set the vSphere iSCSI initiator discovery list to both 192.168.0.1 and 192.168.1.1, I can't use multi-path?

Really appreciate some clarification here as two locations each with some P4000 and each with one or more vSphere hosts seems the most simple thing to want to do, so I'm obviously missing something simple.
teledata
Respected Contributor
Solution

Re: vSphere HA and "split brain"?

In your scenario if the link fails, the storage at Site B will fail, which will cause the VMs on Server B to crash.

vSphere HA should detect the VM failure and start them on Site A.

And Yes, best practice for a multi-site SAN would be to create 2 subnets, so you maintain 2 VIPs. (then add BOTH VIPs to ALL VMware servers in the cluster). But you are correct that there is a trade-off, you will give up vSphere Multi-Path if you use the HP LeftHand Multi-Site-SAN configuration.

I configured this for a hospital, but we actually created 2 multi-site clusters. One where the FOM was in Site A, and the 2nd where the FOM was in Site B. This way EACH site had it's own storage. Servers that should stay up in Site A would live on Cluster A, which (in a link failure) should stay up in Site A, and vice-versa for Site B.

http://www.tdonline.com
Paul Hutchings
Super Advisor

Re: vSphere HA and "split brain"?

Thanks for the reply (and the stuff we discussed via email - has a meeting with a senior Lefthand chap this week and came away very impressed).

So if we're "only" looking at 2-3 nodes per site (probably a mix of 15k and 7k SAS nodes) what is the best way to go about getting performance with a reasonable level of automation?

The P4000 seems the simple bit, vSphere is where I'm struggling.

How feasible is it to just stick a 10gbps link between the switches in each site and use a single subnet?
Paul Hutchings
Super Advisor

Re: vSphere HA and "split brain"?

Actually thinking out loud, if I used vSphere multi-pathing but only pointed the hosts in Site A to the Site A VIP and the hosts in Site B to the Site B VIP what would I lose?
teledata
Respected Contributor

Re: vSphere HA and "split brain"?

If you were to do Network RAID 10, created a 2-site cluster, but only used your local VIP here's what would happen:

1) Volume is striped in-so-that there are redundant blocks, but they are always across the WAN link in the 2nd site. If you lost 1 storage node at Site A, you could have a little wait time while the VIP detected failure and moved from 1 node to another. This is most noticeable with SQL/Exchange and Virtual disks. The time for the VIP to detect failure, and migrate may exceed the timeout of the iSCSI connection, thus loosing a storage connection.

If you have 2 VIPs listed in the VMware iSCSI setup, it has a 2nd path to connect to storage. If VMware detects a loss of storage path it will immediately retry on a 2nd path (which is over the WAN) and it should find the Site B VIP and be re-directed to the copy of the data on a module at SITE B
http://www.tdonline.com
Paul Hutchings
Super Advisor

Re: vSphere HA and "split brain"?

Thanks, from speaking to a couple of other people it's sounding like the way to go is this:

Site A
Server A
Switch A (VIP 192.168.0.x)
VLAN1
VLAN2
|
|Link with VLAN tagging for VLAN1 and VLAN2
|
Site B
Server B
Switch B (VIP 192.168.1.x)
VLAN1
VLAN2

And to have both VLANS span both switches (so ideally 10gbps but maybe 2x1gbps minimum link).

That way each server can connect to storage in either site and (hopefully) there is enough bandwidth between sites for both replication and iSCSI traffic.

I'm not familiar enough with failures of storage on vSphere to know how instantly/quickly any switchover would be?

For things like Exchange/SQL and File Server data I'm thinking I'm most likely to be using the P4000 MPIO within the Windows VM's so I can take
application aware snapshots of Exchange/SQL.

Sound like a sensible plan?