Disk Enclosures
1747985 Members
4581 Online
108756 Solutions
New Discussion юеВ

Dual FC Path failure stalls VM guest server

 
Beer Grill
Advisor

Dual FC Path failure stalls VM guest server

Hi guys
I have the following setup.
Two DL380 G5 servers each with 2 FC HBAs, attached via switches to an MSA1500.
ESX u4 is installed on both servers, and the MSA1500 is running V7.00 FW Active/Active.
I have a single LUN used as a VMFS Datastore presented to both servers where my guest servers live. The Esx servers are able to access the LUN via either HBA, when failover was tested,breaking one fibre path at a time.
I decided to see what would happen if both paths failed and found the following.
The guest server does not fail over to the other Esx server.
The guest server "appears" to respond to pings.
But when attempting to open a console there is a message across the top of the console window that there is no path to the guest servers disks.

My question - Is this a limitation of VMware HA, or is there something I should be tweaking to enable guest server failover to the other Esx server.

ta


7 REPLIES 7
Uwe Zessin
Honored Contributor

Re: Dual FC Path failure stalls VM guest server

H(igh)A(availability) is just a marketing name...

It only kicks in if no heartbeats from an ESX server are received for some time.
.
Steven Clementi
Honored Contributor

Re: Dual FC Path failure stalls VM guest server

I believe the timeout is configurable, though I would have to actually look for it since I never had to change it.


Steven
Steven Clementi
HP Master ASE, Storage, Servers, and Clustering
MCSE (NT 4.0, W2K, W2K3)
VCP (ESX2, Vi3, vSphere4, vSphere5, vSphere 6.x)
RHCE
NPP3 (Nutanix Platform Professional)
Uwe Zessin
Honored Contributor

Re: Dual FC Path failure stalls VM guest server

Yes, the timeout is configurable for some time now. Phew. In the initial release an isolated server switched off its VMs after 13 seconds! And the other one attempted to restart VMs after 15 seconds.

But this is a 'ping timeout' between service consoles - is does not detect if a VM 'hangs' due to storage problems.
.
Beer Grill
Advisor

Re: Dual FC Path failure stalls VM guest server

Hi Guys
I had adjusted the heartbeat settings, but there was no difference, as Uwe mentioned it's based on a "ping" response.Seems that only if the comms are affected, only then fail over happens.

thanks for the replies
Wickedsunny
Valued Contributor

Re: Dual FC Path failure stalls VM guest server

The VMware HA is not for Storage Failover. It is to be used for ESX Clustering.

So If you have 2 ESX Hosts and one of them goes off, then depending upon the resource pools and HA policies the other ESX will try and start those VM's which are suppose to be started.

There is a heartbeat connection between both the ESX boxes and once the heartbeat is lost the other ESX will try and start the VM's which are defined as per HA policy.

So HA Clusters the ESX Boxes and not the VM's living inside them.

Regards,
Sunny
Wickedsunny
Valued Contributor

Re: Dual FC Path failure stalls VM guest server

Also, with vSphere 4, VMware has introduced Failover, which will cluster the VM's running. So if a VM fails, the VMware Failover will take an image of the running state of the VM and failover to a working VM in the cluster..

With this Microsoft Clustering will literally go for a TOSS.

Regards,
Sunny
Uwe Zessin
Honored Contributor

Re: Dual FC Path failure stalls VM guest server

It sounds a bit like you are talking about "Fault Tolerance" which is another means to cope with ESX server failures, but it does not 'take an image of the running state of the VM'. All input of the primary VM is duplicated and send across the FT logging link to the secondary where it is applied as some kind of 'replay' - think of it as some kind of 'permanent VMotion updates'. The output of the secondary is discarded until it is activated.

As of today there are many limitations (latest processors required for the lockstep, one vCPU per VM, must have enough bandwidth on the FT logging link, ...) -- so it is not just a matter of flipping some bits...


Or did you talk about some enhancements of VM HA? I am not confident it will really deal with all kind of failures at the storage layer. Last time I checked it just looked at the VMware Tools heartbeat.
.