Array Performance and Data Protection
cancel
Showing results for 
Search instead for 
Did you mean: 

Snapshot Schedules - Best Practice?

SOLVED
Go to solution
jeberhardt42
Occasional Advisor

Snapshot Schedules - Best Practice?

I'm looking to see if anyone has developed a best practice in regards to prioritizing the servers in the environment and creating a volume collection to capture snapshots for them. My original thought was to create a High, Medium, and Low priority volume collection and set snapshot schedules for each of them. I am still fairly new with the way that Nimble captures and stores it's snapshots and have been reading some other community posts that help explain the difference between Dell, EMC, etc. and Nimble to get a better idea of how Nimble separates themselves from the others.

95% of our clients that we're putting in SANs are not extremely diverse with Exchange and SQL database sitting on different volumes than the VM itself, etc. I read an article that made a statement that 50% of clients that have installed a Nimble are storing over a months worth of snapshots. I am used to working with other vendors in a previous job where you'd store maybe 1 weeks worth before purging out the oldest.

Has anyone developed this type of schedule (high, medium, low) and prioritized the volumes into a collection for snapshot storage? If so, how often are you capturing snapshots in each collection? How many are you retaining? Is there a better way that I should be thinking about the snapshot retention for the smaller to mid-sized businesses?

Thanks in advance

10 REPLIES
jliu79
Frequent Advisor

Re: Snapshot Schedules - Best Practice?

Each company is different so what's best for one might not work for another. In our case, based on different applications we set up a combination of hourly,daily,weekly,monthly, and yearly snapshots schedules and also set up retention plans based on different requirements on primary and DR sites. We almost completely get rid of traditional backups(only do a tape backup once every year for longer retentions)

frankmig110
Occasional Visitor
Solution

Re: Snapshot Schedules - Best Practice?

Joe,

Each application/service typically has a unique retention requirement.  With your Nimble array you can setup widely different policies for single or multiple volumes.  While I was IT Director at Money Mailer I would use policies that made sense for my unique environment.

For Exchange 2010 (Win2K8R2 vm on ESX with five 1TB volumes mounted with the Nimble Windows Toolkit - iSCSI initiator in the vm) all the volumes were added to a volume collection with a synchronized (VSS enabled) snapshot schedule that ran every 6 hours and replicated to my DR site.  I would keep those for 7 days.   In that same policy I had a job that ran on Sunday night as well that lasted 52 weeks (used for legal discovery).

For SQL 2005 (similar config as Exchange) the DB and LOG volume were included in a SQL volume collection.  Those ran every 4 hours with synchronization (VSS enabled).  I kept 7 days of 4hr snaps locally but only kept the last 6 snaps at the DR site.

For VMFS volumes used for application servers I had snapshots run every night and kept 2 at the DR site.

For VMFS volumes used for VDI workstations I snapped every 12hrs and kept 10 locally and 2 at the DR site.

For volumes used for file/art data (Money Mailer is an advertising company) I would snap every hour and kept 7 days locally and 2 at DR, I would snap Sunday night and keep 52 locally, and 2 at the DR site.

hope this helps!

Frank

Valdereth
Trusted Contributor

Re: Snapshot Schedules - Best Practice?

I'd have to echo Frank and Jason's responses.  I think you'll find the Nimble protection templates to be an excellent compliment to the data protection plan for a number of different scenarios.  Especially when you start looking at utilizing them for Dev/Test situations, you'll really start to appreciate the benefits of redirect on write snapshots 

Start out with a simple template, get familiar with the interface and options.  I'd be willing to bet after a week of testing out the capabilities you'll start to see why there are customers holding on to a large number of snapshots. 


Don't forget to check out Infosight->Data Protection->Planning after you've got a week of snaps or so!  It really shows you how efficient the snapshots are and takes the guesswork out of bandwidth required for a DR site.

jeberhardt42
Occasional Advisor

Re: Snapshot Schedules - Best Practice?

Thanks for the info Frank, that's a great place to start planning with some of our clients. I appreciate the feedback!

cfvonner
Occasional Advisor

Re: Snapshot Schedules - Best Practice?

I pretty much agree with what everyone else has said.  I've got a pretty intense snapshot schedule on my system (might be a little over the top), but it has worked pretty well.  Here are a few things to watch out for:

  • For VMware VMs, if you do quiescing snapshots, don't put too many VMs on the same volume or firing at the same time (like all VMs exactly at the top of the hour).  VMware vCenter will bog down trying to do all the simultaneous VM snapshots with VSS quiescing.  Stagger the volumes or volume groups snapshot schedule times.
  • For SQL Server (especially if you use VMDKs for your data and log volumes rather than direct iSCSI connections) VSS snapshots may create some unacceptable latency to database operations.  If you can survive taking blind snapshots most of the time and reserve the VSS snapshots to off-production hours, that might work better.  I take 15minute and hourly snapshots "blind", then VSS snapshots daily (YMMV).
  • If your Windows VMs do volume shadow copy operations, stagger the times from the Nimble Snapshots.  Otherwise, VSS may be busy handling the shadow copy operation when Nimble tries to initiate a second VSS for the storage snapshot, and you could see "Failed to create vCenter snapshot" errors from Nimble.

-Carl V.

david_tan2
Valued Contributor

Re: Snapshot Schedules - Best Practice?

Hi Carl,

Interesting comment on the snapshots on sql server... I can't compare or try out sql on vm's but I can comment on snapshots on iscsi volumes. What Nimble firmware are you running?

The latency you speak of will be the duration the database IO freeze that occurs. This can vary a lot depending on nimble OS on the array and the software used to take the snapshot. This is roughly the timings I've found for vss consistent snaps on sql (on 2 volumes):

Nimble OS 1.4.x native: 2-10+ sec

Nimble OS 1.4.x commvault: 6-20+ sec

Nimble OS 2.1.x native: 2 sec

Nimble OS 2.1.x commvault (snap engine): 4 sec

Nimble OS 2.1.x commvault (snap&repl engine): 2 sec

Keeping in mind the 10s hard limit for vss, on 1.4.x we'd see occasional timeouts with native and regular timeouts with commvault (commvault snaps disks in series rather than parrallel). It appeared the variations were due to how busy the array was at the time. After upgrading to OS 2.1.x we observe lower and much more consistent times. Commvault supports nimble replication on 2.1.x and actually relies on a nimble volcol so this is why the snap&repl engine is faster than the snap only engine which still snaps in series.

Cheers

cfvonner
Occasional Advisor

Re: Snapshot Schedules - Best Practice?

David,

I haven't tried doing the quiescing snapshots on SQL since I was on 1.x.  I'm on 2.1.7 now, so I'll give it another shot.  I think the issue was that I'm using VMDK's for my SQL volumes, not direct-iSCSI-mounted volumes.  So I had Nimble-->VMware-->VSS-->VMware snapshot-->Nimble snapshot-->VMWare remove snapshot-->VSS remove snapshot as the process flow, which overall took too long.  At least I think that's how that works.

-Carl V.

david_tan2
Valued Contributor

Re: Snapshot Schedules - Best Practice?

Hi Carl

Would love to know how you go after trying it again. Seems like more users are doing what you are with sql on vm's - be good to hear from others to see if they have come across the same issue as you or if their setup is working fine.

cfvonner
Occasional Advisor

Re: Snapshot Schedules - Best Practice?

Turned on the quiescing for both the hourly and the 15-minute snapshots.  So far, no timeouts to speak of.  I'll keep monitoring it and let you know.

-Carl V.

cfvonner
Occasional Advisor

Re: Snapshot Schedules - Best Practice?

So I'm having issues again, but only with Windows Server 2012R2 VMs.  Apparently there are some issues with Microsoft's VSS when running on vSphere, as documented here: VMware KB: Creating a quiesced snapshot of a Windows virtual machine generates Event IDs 50, 57, 137, 140, or 12289 and here: VMware KB: Cannot take a quiesced snapshot of Windows 2008 R2 virtual machine.  They reference some technotes from Microsoft, but there is no real solution at this time.  I created a test VM with a GPT disk (per Microsoft's Auto-recovery fails on all but the first volume of an MBR disk) and tested doing vCenter Synchronization with it.  I got flooded with errors from the Nimble device that it couldn't create the snapshots.  So, for those VMs, I have disabled the synchronization on all but my daily snapshots.  There is a way to disable quiescing on a VM-by-VM basis: VMware KB: Cannot take a quiesced snapshot of Windows 2008 R2 virtual machine.  I'm going probably going to try that next.

-Carl V.