Application Integration
1753587 Members
6660 Online
108796 Solutions
New Discussion юеВ

Re: Virtualized Exchange DAG Failover During VM Snapshot

 
julez66
Frequent Advisor

Virtualized Exchange DAG Failover During VM Snapshot

This question is for anyone running a virtualized 2010 or 2013 Exchange DAG and leveraging snapshots of the actual datastores the VM is located on.

In our environment we run a couple Exchange 2013 DAG members and take an array snapshot of the datastores those members are on.  When the snapshot is taken like all snapshot/backup systems that leverage VMware snapshot technology, VMware takes a snapshot.  When the VMware snapshot occurs it causes a DAG failover.  I've gone through and read several blog entries about changing cluster delays etc... to eliminate that issue.

None however seem to stand out any more than the Veeam article here.

KB1744: Tips for DAG Exchange Backup and Replication in vSphere

Granted we aren't leveraging Veeam, but again like I said above Nimble and more or less everyone who leverages VMware for backups and snaps leverages it about the same way.

So...
Question 1, has anyone else seen similar behavior and how did you fix it?
Question 2, the 5th line in that Veeam KB above talks about snapshot.maxConsolidateTime being reduced to 1 second to stun the VM.  The up side to this of course is that it decreases that stun time the VM would have to be stunned.  The down side is it doesn't give the array near as long as the default 6 seconds VMware makes that value and could cause the array to fail the snapshot if it doesn't have enough time to complete the snap.  So do the Nimble engineers see any problem with this?

5 REPLIES 5
Valdereth
Trusted Contributor

Re: Virtualized Exchange DAG Failover During VM Snapshot

How about protecting the passive nodes with the vCenter synch template and the active nodes without the vCenter synchronization?

I've seen agent based backup solutions run into the same problem, where the VSS snapshot that gets triggered causes a failover.  So its not entirely accurate to say that VMware snapshots are the cause. 

julez66
Frequent Advisor

Re: Virtualized Exchange DAG Failover During VM Snapshot

There isn't really a "passive node" in this case, they all have active databases on them.  Database location shouldn't really matter as the volumes with those databases are not snapped during the VM snapshot, those are Windows iSCSI guest volumes and not VMDKs.  Ultimately the problem I'm sure is the fact that the timeout for the failover is the problem since DAG failover is touchy to say the least.  They don't always seem to failover, but have definitely done so more than a few times.

But was curious as to what others have seen.

Not applicable

Re: Virtualized Exchange DAG Failover During VM Snapshot

Q1 - Yep we had this feature when we upgraded to Exchange 2010 and took a backup. We resolved it by making those cluster changes to exchange (in the KB). A google search shows it's quite a common fix for anything that runs MS clustering (also fixed our SCOM ops mgr failover too)

Q2 - I haven't set this and have no issues.

julez66
Frequent Advisor

Re: Virtualized Exchange DAG Failover During VM Snapshot

Thanks Travis, based on my discussion with VMware and getting some other confirmations much like your own I'm going to give that a roll tonight.

I don't think we'll end up doing the maximums as I think we only need to get over the 6 sec stun VMware defaults to.

I'll let you know and star you with the correct answer given everything checks out in the morning

Thanks for your confirmation!

julez66
Frequent Advisor

Re: Virtualized Exchange DAG Failover During VM Snapshot

We moved out the threshold to 10 and left the heartbeat at 1000ms and it looks like everything is good to go now.
Thanks again for your response Travis.