HPE SimpliVity
1753852 Members
8425 Online
108807 Solutions
New Discussion юеВ

Re: Lost storage/federation network - SVT Behavior

 
steez
Frequent Advisor

Lost storage/federation network - SVT Behavior

Hello,

I am currently working on HA whitepaper about the Simplivity.

I have a question about the OVC behaviour where:

Switched 2 node cluster loses storage and federation network. Management network is still up meaning that the Arbiter sees both nodes. What is the outcome of this situation?

Will the Arbiter kill one of the hosts and VMs will restart on the Arbiter decided "Master" host?

6 REPLIES 6
gustenar
HPE Pro

Re: Lost storage/federation network - SVT Behavior

The arbiter won't "terminate" any host as it isn't one of its functions. If storage and fed network is lost in a host but the svtfs service is still running, you will see alerts that "Simplivity Datastores Access Impaired" most likely. If the svtfs service is down then that's different, the OVC will failover and ownership of the VMs will change. 


I am an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
tonymcmillan
Frequent Advisor

Re: Lost storage/federation network - SVT Behavior

In this scenario, both hosts are healthy, they just can't replicate the VMs they're hosting. So, the hosts should continue to run their respective running VMs but you won't have replicated copies of the VMs across both hosts. When the network issue is resolved, the hosts will re-sync the VM data and you have true HA status.

The Arbiter should be keeping track of which host owns each VM and has the latest data for it. 

steez
Frequent Advisor

Re: Lost storage/federation network - SVT Behavior

Hello, and thank you for responses @gustenar @tonymcmillan, great information, that clarifies some things.

I am now confused about other thing - hypothetically if I have storage and federation link loss that means that VMs are no longer being synced, but everything stays up. 

What if one of the hosts suddenly dies? Then VMs will HA restart on other available host unsynced with old data causing a data loss/coruption? What is the behavior then?

I know that this is very unlikely, but you know the saying if it can go wrong it will go wrong.

tonymcmillan
Frequent Advisor

Re: Lost storage/federation network - SVT Behavior

Losing sych traffic and then a host which has running VMs on it fails is a real scenario.

I too wonder how this system is programmed to deal with this scenario. 

Is anyone from HPE able to shed light on this?

steez
Frequent Advisor

Re: Lost storage/federation network - SVT Behavior

bump

MikeSeden
HPE Pro

Re: Lost storage/federation network - SVT Behavior

I recently had a situation where the CU found over the weekend on a 2 x 2 Federation on one of the 2 node clusters, The 2 OVCs went red in vSphere with the errors Cannot find Federation Network and cannot find Storage Network. All of the VMs were yellow with the warning VM was not HA. Every now and then a VM would go green then yellow again. There was no loss of service. svt-vm-show showed the VMs on that cluster as No, Syncronizing or waiting, and those changed around on the VMS often.

This would be the behavior you asked about- The OVCs couldn't get the data to the secondary. We checked a lot of things but because the nodes were direct connect, we didn't fix it until we swapped the active/standby NICs on one node. Within a minute all of the VMs were green and the OVCs yellow. Don't know if this helps for your whitepaper or not. It just verifies in my mind again that most of the time it isn't a SimpliVity problem that causes the issue it is a peripheral. (I think it was a SFP or a fiber cable)

Turns out it was a little known issue with ESX. A flea drain would have probaly done the trick, be we did:

./zeus.sh --ssh host cluster 'esxcli network vswitch standard policy failover set -a vmnic5 -s vmnic4 -v "vSwitch1"

esxcli network vswitch standard uplink remove --uplink-name=vmnic4 --vswitch-name=vSwitch1

esxcli network vswitch standard uplink add --uplink-name=vmnic4 --vswitch-name=vSwitch1

esxcli network vswitch standard policy failover set -a vmnic5 -s vmnic4 -v "vSwitch1"

esxcli network vswitch standard portgroup policy failover set -a vmnic5 -s vmnic4 -p "SVT_StorPG"'

./zeus.sh --host cluster service restart


While I am an HPE Employee, all of my comments (whether noted or not), are my own and are not any official representation of the company
Accept or Kudo