Re: VME management VM - automatic failover

chypsa · 3 weeks ago

As the title says, I'm trying to figure out whether the VME management VM is supposed to failover to a new host if the host currently holding it goes down. This is a lab environment hosted on a HyperV host, but I think it makes no difference.

I'm testing with the latest public version, ceph storage, 3-node cluster. The mgmt VM is on ceph storage, and set to Failover mode.

I turn off the primary node (currently holding the mgmt VM) and...nothing.

I was hoping that the system would recognize the importance of the management plane and start the VM on a different node out-of-the-box, but that simply doesn't happen.

Is this by design? Or maybe I misconfigured something and that's actually supposed to be working, but it's only not working in my lab?

What's the idea?

Sanika · 3 weeks ago

Hi @chypsa
In my opinion, the management VM does not automatically fail over to another host if its node goes down. Even though it’s stored on Ceph and marked “Failover,” the cluster doesn’t treat the management plane like a regular workload VM.
It stays down until you manually start it elsewhere or set up extra HA tools. The “Failover mode” you see applies to guest VMs, not the management VM itself.
So what you saw in your lab is by design. If you want automatic recovery, you’ll need to add external HA for the management plane.

Hope this helps.

Regards,
Sanika.

If you feel this was helpful, please click the KUDOS thumb below. Also consider marking this as an "Accepted Solution" , if the post has helped to solve your issue.

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

chypsa · 3 weeks ago

Hello Sanika,

thanks for your response! That is in line with my findings, but I wanted to be sure.

I did eventually find a solution using Pacemaker.

Kind regards,

chypsa

jayst136 · 3 weeks ago

Hi.

I've tested this as well. The VME manager definitely is restarted on another host upon host failure. No special things needed, other than there must be a minimum of one heartbeat target datastore. It should be restarted just like any other VM that's made high available within a cluster and has the "failover" or "auto" setting for placement policy. It does however take a while for VME to discover it's not running on the failed host anymore after it's restarted on one of the remaining hosts, not sure what's that about (yet).

chypsa · 3 weeks ago

That's very interesting to hear!

It was suggested to me that I have to set the Heartbeat target on the Datastore, but the person doing it was using iSCSI targets on a NAS (another lab deployment).

I am testing a ceph deployment and that does not have the option to use as Heartbeat target. So, I misunderstood that the main VM store itself must have the tag "Heartbeat target".

From what you're saying, I gather that ANY datatore which can be configured as Heartbeat target? So, I just configured an NFS target with that bit set, turned off the host which is holding the VM (it is set as Managed and Failover)...and nothing happened. It's been standing like this for some 25 minutes and still did not start the mgr VM on a different node.

Which makes me wonder, again, what am I missing?

EDIT: Yes, the machine is set to autostart. It is just not showing up on other nodes at all. Did you copy the XML and define the VM manually or does it work out of the box?
Could you write up a little bit more about your environment specifics? What storage type is used, how you configured it etc?

jayst136 · 3 weeks ago

sure!
i'm using GFS2 datastores on a FC connected array. One of them has the heartbeat datastore ticked.

I'm not sure about the requirements for heartbeats in all different storage scenario's. But, i think you went the right way in defining a datastore with the possibility of ticking that heartbeat target feature.

not sure why it's not starting then. Did you add the heartbeat datastore on the cluster level?
Did you also to a test with any other VM on the cluster?

overall, my experience is consistently showing VMs will be restarted on other hosts in about 3 minutes, so that's what you're looking for.

Categories

Company

Local Language

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: VME management VM - automatic failover

VME management VM - automatic failover