Array Performance and Data Protection
cancel
Showing results for 
Search instead for 
Did you mean: 

Snapshot Leads to Guest OS Becoming Unresponsive

SOLVED
Go to solution
jjones130
Occasional Advisor

Snapshot Leads to Guest OS Becoming Unresponsive

We've been seeing an issue where snapshots periodically fail for typically the same 2 or 3 VMs. 2 of them are almost retired Win2003 boxes but the 3rd is a 2008 R2 web application server. On all 3 of these when a snapshot fails the VM will lockup to the point where either I have to do a hard reset or HA takes care of it because it's lost the heartbeat.  Once I moved the VM out of the protected LUN the lockups stopped.

Has anybody else seen this or have pointers? I'm running fully patched vSphere 5.5, both ESXi and vCenter and have a CS-300 with 10 GbE copper connectivity.

4 REPLIES
chris24
Respected Contributor

Re: Snapshot leads to guest OS becoming unresponsive

Disable vmware tools VSS writers, these are only required for applications such as exchange and SQL and to a degree file servers,hardware snapshots suffice. You can disable VSS writers via vmbackup.conf http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1031200

The issue usually happens when there is a conflict with writers or during the consolidation of the snapshot. Also ensure you have at least 10% free space in the datastores.

jjones130
Occasional Advisor

Re: Snapshot leads to guest OS becoming unresponsive

In the case of one of these having issues it is a SQL server so right off the bat that isn't helpful. Further Nimble isn't the only thing hitting these servers with VSS requests, Veeam does so everyday as well and disabling the writer will kill large quanties of good features in the product, file level restoration, application level restore, etc. Neither Veeam and natively created snapshots create any issues on these servers, only the Nimble created ones.

chris24
Respected Contributor
Solution

Re: Snapshot leads to guest OS becoming unresponsive

Try moving the affected VM's into their own volume collection and see if the issue persists, veeam process jobs sequentially (I believe) Nimble processes 5 at a time.

jjones130
Occasional Advisor

Re: Snapshot leads to guest OS becoming unresponsive

Veeam does parallel processing as well limited by the number of available cores on the server doing the processing. Not that I'm opposed to moving them to a different volume collection, but is there a way to tweak how many jobs the array processes at a time? I only have a single volume in each collection, but each does have a sizable number of VMs per volume, >10.