Run SRM recovery plans for vVol VMs

mamatadesaiNim · ‎06-25-2020

In part 1 of this blog series, we saw how to setup vSphere 7.0 at two sites, with 2 HPE Nimble Storage arrays for a disaster recovery with SRM. In part 2, we setup SRM 8.3 with vVols. In this concluding part 3, let’s move on now to create, test, and run recovery plans for vVols on SRM. A recovery plan is like an automated run book. It controls every step of the recovery process, including the order in which SRM powers on and powers off VMs, the network addresses that recovered VMs use, and so on. You can run only one recovery plan at a time to recover a particular protection group.

Create

When you create or modify a recovery plan, test it before you try to use it for planned migration or for disaster recovery. Go to SRM --> select a site pair --> Recovery Plans. You should see the demo RP created in Figure 49: Create a new recovery plan. In the right side panel, “Summary” should show “Plan Status” Ready, “VM Status” Ready for Recovery for 2 VMs. “Recovery Steps” shows every step that will be taken for recovery in this plan.

Figure 53: Recovery Plan steps

Test

By testing a recovery plan, you ensure that the VMs that the plan protects recover correctly to the recovery site. If you do not test recovery plans, an actual disaster recovery situation might not recover all virtual machines, resulting in data loss. Testing a recovery plan exercises nearly every aspect of a recovery plan, although SRM makes several concessions to avoid disrupting ongoing operations on the protected and recovery sites.

Testing a recovery plan runs all the steps in the plan, except for powering down VMs at the protected site and forcing devices at the recovery site to assume mastership of replicated data. Testing a recovery plan creates a snapshot on the recovery site of all the disk files of the virtual machines in the recovery plan.

Click TEST as seen on the recovery plan shown above. The Plan status shows “Test in progress” and the Recovery Steps are updated with information on how the test is proceeding.

Figure 54: Test options

Figure 55: Tasks at recovery site while a test is being run

Figure 56: Test in progress

Figure 57: Test complete

When things proceed smoothly and each step results in a Success status, wait for plan status to change to “Test complete”.

Cleanup

After testing a recovery plan, you must successfully run a cleanup operation, before running a failover or another test.

SRM performs several cleanup operations on the recovery site after a test:

Powers off the recovered VMs.
Replaces recovered VMs with placeholders, preserving their identity and configuration information.
Cleans up replicated storage snapshots that the recovered VMs used during the test.

In the screenshot above, you can see the CLEANUP button is enabled. Go ahead and click that.

Figure 58: Start a cleanup after test

Figure 59: Cleanup complete

Plan status switches back to “Ready” after successful cleanup. Site B VMs are also replaced with placeholder VMs again.

Run – planned or unplanned migration

Planned Migration: You can run a recovery plan under planned circumstances to migrate VMs from the protected site to the recovery site. SRM synchronizes the VM data on the recovery site with the VMs on the protected site. It attempts to shut down the protected VMs gracefully and performs a final synchronization to prevent data loss, then powers on the VMs on the recovery site. If errors occur during a planned migration, the plan stops so that you can resolve the errors and rerun the plan. You can reprotect the VMs after the recovery.

Unplanned Migration: If the protected site suffers an unforeseen event that might result in data loss, you can also run a recovery plan under unplanned circumstances.

In the screenshot above, you can see the RUN button is enabled. Go ahead and click that.

Figure 60: Start recovery

Figure 61: Recovery progressing smoothly

Figure 62: Recovery complete

Reprotect

After a recovery, the recovery site (site B) becomes the primary site, but the VMs are not protected yet. If the original protected site (site A) is operational, you can reverse the direction of protection to use the original protected site as a new recovery site. SRM provides the reprotect function, which is an automated way to reverse the protection.

After SRM performs a recovery, the VMs start up on the recovery site (site B). By running reprotect when the protected site comes back online, you reverse the direction of replication to protect the recovered virtual machines on the recovery site back to the original protected site (site A).

Reprotect uses the protection information that you established before a recovery to reverse the direction of protection. You can initiate the reprotect process only after recovery finishes without any errors. If the recovery finishes with errors, you must fix all errors and rerun the recovery, repeating this process until no errors occur.

Before you can run reprotect, you must satisfy the preconditions:

Run a planned migration and make sure that all steps of the recovery plan finish successfully. If errors occur during the recovery, resolve the problems that caused the errors and rerun the recovery. When you rerun a recovery, operations that succeeded previously are skipped. For example, successfully recovered VMs are not recovered again and continue running without interruption.
The original protected site must be available. The vCenter Server instances, ESXi Servers, SRM Server instances, and corresponding databases must all be recoverable.
If you performed a disaster recovery operation, you must perform a planned migration when both sites are running again. If errors occur during the attempted planned migration, you must resolve the errors and rerun the planned migration until it succeeds.

Reprotect is not available under certain circumstances:

Recovery plans cannot finish without errors.
You cannot restore the original site, for example if a physical catastrophe destroys the original site. To unpair and recreate the pairing of protected and recovery sites, both sites must be available. If you cannot restore the original protected site, you must reinstall SRM on the protected and recovery sites.

In the screenshot above, you can see the REPROTECT button is enabled. Go ahead and click that. At the end of REPROTECT, the Summary tab for this recovery plan will show the Protected Site is “Site B” and the recovery site is “Site A”.

Figure 63: Start reprotect

Figure 64: Site B is now the protected site and site A is the recovery site

Failback

To restore the original configuration of the protected and recovery sites after a recovery, you can perform a sequence of optional procedures known as failback.

Planned migration --> Reprotect --> Planned migration --> Reprotect again

VMs replicate from site A to site B.
Perform a reprotect. Site B, the former recovery site, becomes the protected site. SRM uses the protection information to establish the protection of site B. Site A becomes the recovery site.
To recover the protected VMs on site B to site A, perform a planned migration.
Perform a second reprotect. Site A becomes the protected site and site B becomes the recovery site.

Figure 65: Failback

History

If you wish to see the history for your recovery plans, go to “History” and you can see and export history for all operations performed on a recovery plan.Figure 66: History

Conclusion

In this three-part blog series, we have walked you through:

vSphere 7.0 setup for HPE Nimble Storage
Background of VASA3 and configuration of SRM with vVols
Create, test, run recovery plans of SRM with vVols (this post)

In the last 5 years of vVols adoption by customers, we have often heard DR as one of the top three requirements. With this release of SRM 8.3 with support for vVols, VMware has completed that story, along with HPE Nimble Storage as it’s Day 0 support partner. Let us know how we can help you in your vVols journey, together.

HPE Nimble Storage

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Run SRM recovery plans for vVol VMs

Create

Figure 53: Recovery Plan steps

Test

Cleanup

Run – planned or unplanned migration

Figure 60: Start recovery

Figure 61: Recovery progressing smoothly

Figure 62: Recovery complete

Reprotect

Failback

Figure 65: Failback

History

Conclusion

mamatadesaiNim

Author

Kudos