VMware HA Settings

tonymcmillan · ‎06-15-2020

Are there any best practices documented for HA settings in a SimpliVity cluster?

Specifically, we are focused on initiating an HA event when the logical array (boot drives) that support the OVC fails. This has happenened multiple times at a client site and each time, it leaves the VMs stranded on the "failed" host until we power off the node and let VMware HA kick in and restart them on the good host.

Under the vSphere Availability service, there are settings for "Failure conditions and responses". These allow you to specify how VMware handles a datastore failure. By default, these settings (PDL and APD) are disabled.

What are the best settings for this in a SimpliVity environment?

Thanks,

Tony

DaveOb · ‎06-16-2020

There is no recommendation around the setting for APD it may be configured in a Simplivity enviornment.

In your senario I am not convinced it will work as the OVC on the node with the boot drives faulty will still be reacting to ARP they will still be seen as alive by the other nodes in the cluster and there may be ownership transition issues.The ESXI may not be that healthy either and fail to trigger the failover or notice it has APD.

Regarding lost access to a boot drive causing the symptoms I have seen something similar before primarily on Dell based hardware (R730) I would update the FW and ESXI version to the latest supported on the Simplivity interop guide If its still happening open a support ticket later versions have a way to mitigate this exact senario.

I am an HPE employee
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

tonymcmillan · ‎06-16-2020

The cluster is at 3.7.10. We have considered an upgrade to the latest version, 4.0.1 but, are one, concerned that during the upgrade the fauly node could fail again and two, no one at HPE can confirm that this issue has been addressed in a newer version.

When the boot drives fail, ESXi continues to run from memory under this condition. Through vCenter this host shows APD on the boot drive.

After the boot drive has failed, the OVC responds to pings and you can SSH into it. However, once logged into the OVC, is does not respond to commands.

So, what you are saying is that if an OVC is still able to respond on its IP address, the OVC and its node are considered healthy?

AlexLeung · ‎06-16-2020

Hello Tony,

Can I have the case number? Please send me a private message.

>>>>>>>>>>>>>>>>>>>>

If OVC does not respond to commands after startup, you can check whether there is a "nostart" file.

Example:

root@omnicube-ip166-104:/home/administrator@vsphere# ls /var/svtfs/0/ | grep nostart

nostart

root@omnicube-ip166-104:/home/administrator@vsphere# rm /var/svtfs/0/nostart

root@omnicube-ip166-104:/home/administrator@vsphere# start svtfs

svtfs (0) start/running, process 4621

After that, run the svt-federation-show to check.

root@omnicube-ip166-104:/home/administrator@vsphere# svt-federation-show

I work for HPE

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

VMware HA Settings

VMware HA Settings

Re: VMware HA Settings

Re: VMware HA Settings

Re: VMware HA Settings