Around the Storage Block
cancel
Showing results for 
Search instead for 
Did you mean: 

Run SRM recovery plans for vVol VMs

In part 1 of this blog series, we saw how to setup vSphere 7.0 at two sites, with 2 HPE Nimble Storage arrays for a disaster recovery with SRM.  In part 2, we setup SRM 8.3 with vVols.  In this concluding part 3, let’s move on now to create, test, and run  recovery plans for vVols on SRM. A recovery plan is like an automated run book. It controls every step of the recovery process, including the order in which SRM powers on and powers off VMs, the network addresses that recovered VMs use, and so on. You can run only one recovery plan at a time to recover a particular protection group.

 

Create

 

When you create or modify a recovery plan, test it before you try to use it for planned migration or for disaster recovery. Go to SRM --> select a site pair --> Recovery Plans. You should see the demo RP created in Figure 49: Create a new recovery plan.  In the right side panel, “Summary” should show “Plan Status” Ready, “VM Status” Ready for Recovery for 2 VMs.  “Recovery Steps” shows every step that will be taken for recovery in this plan.

Figure 53: Recovery Plan stepsFigure 53: Recovery Plan steps

 

Test

 

By testing a recovery plan, you ensure that the VMs that the plan protects recover correctly to the recovery site. If you do not test recovery plans, an actual disaster recovery situation might not recover all virtual machines, resulting in data loss. Testing a recovery plan exercises nearly every aspect of a recovery plan, although SRM makes several concessions to avoid disrupting ongoing operations on the protected and recovery sites.

Testing a recovery plan runs all the steps in the plan, except for powering down VMs at the protected site and forcing devices at the recovery site to assume mastership of replicated data. Testing a recovery plan creates a snapshot on the recovery site of all the disk files of the virtual machines in the recovery plan.

Click TEST as seen on the recovery plan shown above. The Plan status shows “Test in progress” and the Recovery Steps are updated with information on how the test is proceeding.

Figure 54: Test optionsFigure 54: Test options

 

Figure 55: Tasks at recovery site while a test is being runFigure 55: Tasks at recovery site while a test is being run

 

Figure 56: Test in progressFigure 56: Test in progress

 

Figure 57: Test completeFigure 57: Test complete

 

When things proceed smoothly and each step results in a Success status, wait for plan status to change to “Test complete”.

 

Cleanup

 

After testing a recovery plan, you must successfully run a cleanup operation, before running a failover or another test.

SRM performs several cleanup operations on the recovery site after a test:

  • Powers off the recovered VMs.
  • Replaces recovered VMs with placeholders, preserving their identity and configuration information.
  • Cleans up replicated storage snapshots that the recovered VMs used during the test.

 

In the screenshot above, you can see the CLEANUP button is enabled.  Go ahead and click that.

Figure 58: Start a cleanup after testFigure 58: Start a cleanup after test

 Figure 59: Cleanup completeFigure 59: Cleanup complete

 

Plan status switches back to “Ready” after successful cleanup. Site B VMs are also replaced with placeholder VMs again.

 

Run – planned or unplanned migration

 

Planned Migration: You can run a recovery plan under planned circumstances to migrate VMs from the protected site to the recovery site. SRM synchronizes the VM data on the recovery site with the VMs on the protected site. It attempts to shut down the protected VMs gracefully and performs a final synchronization to prevent data loss, then powers on the VMs on the recovery site. If errors occur during a planned migration, the plan stops so that you can resolve the errors and rerun the plan. You can reprotect the VMs after the recovery.

Unplanned Migration: If the protected site suffers an unforeseen event that might result in data loss, you can also run a recovery plan under unplanned circumstances.

In the screenshot above, you can see the RUN button is enabled.  Go ahead and click that.

Figure 60: Start recoveryFigure 60: Start recovery

 

Figure 61: Recovery progressing smoothlyFigure 61: Recovery progressing smoothly

 

Figure 62: Recovery completeFigure 62: Recovery complete

 

Reprotect

 

After a recovery, the recovery site (site B) becomes the primary site, but the VMs are not protected yet. If the original protected site (site A) is operational, you can reverse the direction of protection to use the original protected site as a new recovery site. SRM provides the reprotect function, which is an automated way to reverse the protection.

After SRM performs a recovery, the VMs start up on the recovery site (site B). By running reprotect when the protected site comes back online, you reverse the direction of replication to protect the recovered virtual machines on the recovery site back to the original protected site (site A).

Reprotect uses the protection information that you established before a recovery to reverse the direction of protection. You can initiate the reprotect process only after recovery finishes without any errors. If the recovery finishes with errors, you must fix all errors and rerun the recovery, repeating this process until no errors occur.

Before you can run reprotect, you must satisfy the preconditions:

  1. Run a planned migration and make sure that all steps of the recovery plan finish successfully. If errors occur during the recovery, resolve the problems that caused the errors and rerun the recovery. When you rerun a recovery, operations that succeeded previously are skipped. For example, successfully recovered VMs are not recovered again and continue running without interruption.
  2. The original protected site must be available. The vCenter Server instances, ESXi Servers, SRM Server instances, and corresponding databases must all be recoverable.
  3. If you performed a disaster recovery operation, you must perform a planned migration when both sites are running again. If errors occur during the attempted planned migration, you must resolve the errors and rerun the planned migration until it succeeds.

Reprotect is not available under certain circumstances:

  1. Recovery plans cannot finish without errors.
  2. You cannot restore the original site, for example if a physical catastrophe destroys the original site. To unpair and recreate the pairing of protected and recovery sites, both sites must be available. If you cannot restore the original protected site, you must reinstall SRM on the protected and recovery sites.

 

In the screenshot above, you can see the REPROTECT button is enabled.  Go ahead and click that. At the end of REPROTECT, the Summary tab for this recovery plan will show the Protected Site is “Site B” and the recovery site is “Site A”.

Figure 63: Start reprotectFigure 63: Start reprotect

 

Figure 64: Site B is now the protected site and site A is the recovery siteFigure 64: Site B is now the protected site and site A is the recovery site

 

Failback

 

To restore the original configuration of the protected and recovery sites after a recovery, you can perform a sequence of optional procedures known as failback.

Planned migration --> Reprotect --> Planned migration --> Reprotect again

  1. VMs replicate from site A to site B.
  2. Perform a reprotect. Site B, the former recovery site, becomes the protected site. SRM uses the protection information to establish the protection of site B. Site A becomes the recovery site.
  3. To recover the protected VMs on site B to site A, perform a planned migration.
  4. Perform a second reprotect. Site A becomes the protected site and site B becomes the recovery site.

Figure 65: FailbackFigure 65: Failback

 

History

 

If you wish to see the history for your recovery plans, go to “History” and you can see and export history for all operations performed on a recovery plan.Figure 66: HistoryFigure 66: History

 

Conclusion

 

In this three-part blog series, we have walked you through:

In the last 5 years of vVols adoption by customers, we have often heard DR as one of the top three requirements.  With this release of SRM 8.3 with support for vVols, VMware has completed that story, along with HPE Nimble Storage as it’s Day 0 support partner.  Let us know how we can help you in your vVols journey, together.

HPE Nimble Storage
About the Author

mamatadesaiNim

VMware and Nimble Storage QA engineer