Around the Storage Block

Deployment Considerations for VMware Application Synchronized Snapshots with Nimble Storage

In the earlier two blogs in this series, we looked at several advantages of application consistent snapshots and also the optimizations built into the Nimble Storage implementation.

However, when deploying this solution in either large scale VM environments or with VMs with write-heavy workloads, it is necessary to understand how VMware Snapshots work and the limitations of the Delta file based snapshots. VMware KB 1015180: Understanding VM snapshots in ESXi is a good resource for this information, specifically the section "Child disks and disk usage".

To illustrate what that section describes, let's try to analyze new write I/O to a VM and impact on amount of time required to delete a snapshot.

- Consider a VM performing new writes at 40 MB per sec

   - If the VM snapshot lives for 10 mins, delta file size will be 24 GB

- Deleting this snapshot requires: 24 GB of reads + 24 GB of writes

- If ESXi Host is able to do this at 80 MB per sec (read + write)

   - Deleting VM snapshot will take 10 mins

   - But VM is still writing at 40 MB per sec during delete, and these writes saved in another delta file

This is a configuration that is unlikely to provide a reliable VM Snapshot-based backup if the Snapshot can only be deleted after 10 mins in a traditional archival based backup. To read the changed blocks and write to a backup archive, it can easily take 10 mins or more.

By utilizing storage snapshots with VMware application synchronization, the snapshot can be deleted within seconds after being created (which is the typical time required to take a storage snapshot of the datastore).

Despite the power and efficiency of storage snapshots, there are a few considerations to keep in mind.

1. Consider the number of VMs in a single datastore

If the datastore contains around 8 VMs and the host cluster has 4 to 8 hosts, irrespective of the amount of write IO, the storage snapshot would be created soon after the VM snapshots, reducing the impact of snapshot consolidation.

However, if the datastore contains 50 or more write-heavy VMs, then VM snapshots created during the beginning of the operation will stay around for longer and accumulate writes while other VM snapshots are being created.

2. Consider the amount of write IO performed by all VMs on the datastore

Once the number of VMs is found to be reasonable per datastore, the next thing to look at is the number of writes those VMs are creating. If the VMs are writing hundreds of MiB per second, even a small delay in creating the storage snapshot (due to one or more VMs queued to create snapshots) will accumulate a large number of writes in the VM snapshot. 

Average writes over the past week can be monitored using HPE InfoSight, and when a datastore is seen to contain write-heavy VMs, it is better to distribute the VMs across multiple datastores for application synchronized snapshots.

Avg Write.jpg

3. Consider the number of volumes in a volume collection

Nimble Storage best practice is to utilize a single volume to create a VMFS datastore.

The only reason a volume collection should have more than one volume in a VMFS datastore deployment is if there are VMs with VM disks on each of the volumes. if the VM disks are colocated on a single datastore, then each volume collection should only have one volume.

By consolidating multiple volumes (hence VMs from multiple datastores) into a single schedule, you could unknowingly ignore the first two considerations above - the number of VMs and the amount of write IO created by all VMs.


About the Author


VMware Integration Architect at Nimble Storage