StoreVirtual Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

Virtual Managers/Failover not working, why?

Paul Hutchings
Super Advisor

Virtual Managers/Failover not working, why?

OK so I've downloaded the ESX VSA demo and setup two nodes.

I have a single site, and a single cluster containing both nodes.

The cluster has a virtual IP.

I have a couple of volumes, each set to 2-way replication.

I can continuously ping the virtual IP so long as the node that is the virtual manager is up and running, if I power that node off (either via the CMC or to simulate the failure of a link or node) I lose the ability to ping the virtual IP, and of course my test server loses access to iSCSI volumes.

I believe I need to move the virtual manager, but it doesn't seem to give any option to do this in the CMC if it thinks the virtual manager is running on the failed/down node, it just says the manager is offline and I seem to go around in a circle where I can't make the other node the virtual manager until I stop the existing virtual manager - which of course I can't do as that node could be down/on fire which is the whole point :-)

I guess I'm doing something wrong, and as much as I read the manual I don't know what?
8 REPLIES
René Loser
Frequent Advisor

Re: Virtual Managers/Failover not working, why?

Hi Paul,

The concept with the Virtual Manager is a manual failover process.
You just add a node to the Virtual Manager in case of a failure.

I recommend to use the FOM (Failover Manager) running on another host (could be Virtual Server or VMware Workstation or VMware Player). The FOM runs always instead of the Virtual Manager. FOM is the decision maker in case of a failure and works automatically.

You should find the FOM Image as well on the CD.

Best regards,
reNe

HP Presales Storage
Mike Povall
Trusted Contributor

Re: Virtual Managers/Failover not working, why?

Hi Paul,

Under normal operating circumstances you should not have a virtual manager running - it should be started manually on the surviving node when a failure occurs thus restoring quorum and access to the volumes.

Using a Failover Manager is definitely the best solution for your environment as it will run all of the time and will maintain quorum and access to the volumes during the periods when one of your nodes is offline for whatever reason.

Regards, Mike.
Paul Hutchings
Super Advisor

Re: Virtual Managers/Failover not working, why?

Thanks guys, a little after posting I worked out that a FOM is what I needed, so I installed one and it works seamlessly (well, a few seconds delay whilst it figures out what's happening but near as dammit seamless).

Paul
teledata
Respected Contributor

Re: Virtual Managers/Failover not working, why?

The Virtual Manager must be configured in advanced.

You may then manually start the virtual manager (on the remaining node) to re-establish quorum.

Using Virtual Manager is NOT providing high availability, as you WILL loose quorum and have to manually start the virtual manager.

The better design is to create a Failover Manager to provide that automated 3rd manager that will maintain quorum for you...
http://www.tdonline.com
Paul Hutchings
Super Advisor

Re: Virtual Managers/Failover not working, why?

OK next question :-)

I guess this question applies to any sort of redundant storage that is seen as a single addressable "cluster" by ESX:

Suppose you have two sites, A and B.
Each contains some nodes making up a storage cluster.
Each contains some servers, ESX most likely.
A and B are linked by a fast LAN link.

Let's say you lose the link between A and B, but the kit in each location is up and running.

You now have "highly available" storage that is still available in both locations.

You have ESX servers that can each still see the storage local to them, but can't see the other servers in the ESX cluster as the link has gone.

So don't you end up with the same VM's now running in both locations as each ESX box's HA would kick in, and each ESX box can still access its shared storage as your SAN is resilient?

I'm sure I'm overlooking something obvious here as I can only think about it right now vs. actually do it.
teledata
Respected Contributor

Re: Virtual Managers/Failover not working, why?

Split-brain scenario is prevented by requiring a majority of storage managers to maintain quorum.

In a perfect world you would have a 3rd site (that is connected to the first 2 sites) that hosts your failover manager.
If that configuration is not possible, you then run your failover manager at your primary site that you want to stay up if your site link goes down.

If you have 4 nodes (2 at each site), only the site that has access to 3 managers (the 2 local nodes, plus failover manager) will have quorum (and thus access to storage)in the event of a link failure.
http://www.tdonline.com
Gauche
Trusted Contributor

Re: Virtual Managers/Failover not working, why?

Paul Hutchings
Super Advisor

Re: Virtual Managers/Failover not working, why?

Thanks, I'd actually tested failover using the FOM in this exact scenario and I clearly wasn't thinking when I posted the question - only the site that has Quorum will have the cluster IP so you can't actually have two sites in "split brain".