Array Performance and Data Protection
cancel
Showing results for 
Search instead for 
Did you mean: 

Proof of concept DR test

SOLVED
Go to solution
ej7257
Occasional Contributor

Proof of concept DR test

We currently have two Nimble arrays (one at production, one at DR site) using protection groups to replicate volumes from the primary array to the DR array every two hours over a gigabit link.

We have three VMware ESXi 5.5 hosts at each location connected to the Nimble via iSCSI.

We would like to perform a small "proof of concept" DR test with 5 Server 2008 VM's / volumes to satisfy one of our department's regulatory compliance requirements.

The overall goal is to take down the VMs at the production site, bring them up at the DR site, have the staff verify everything works, then delete the VMs at the DR site and turn the VMs back on at the production site.

What is the best procedure to accomplish this without using third party software?

We are thinking:

  1. Shut down the five Windows servers
  2. Take a manual Nimble snapshot of the five volumes
  3. Take the five volumes offline on the production Nimble
  4. On the DR nimble, find the snapshots we just took and set them to online
  5. "Rescan all" in VMware to discover newly online datastores on the DR Nimble
  6. Browse datastores, add .VMX files to VMware inventory, and power on
  7. Verify functionality, then power off servers
  8. Take volumes offline on DR Nimble
  9. Set volumes online on production Nimble
  10. Power on servers

Maybe it's not necessary to take the production volumes offline on the Nimble since the servers are shut down.

Would it be better to close the snapshots and bring the clones online on the DR Nimble?

Am I missing anything? Are we going about this totally wrong? Sorry for my ignorance, this is sadly the first time we've tried a DR test since we implemented the Nimble solution.

8 REPLIES
jwang131
Occasional Visitor
Solution

Re: Proof of concept DR test

Assume that each side has its own vCenter, and each volume is a datastore where w2k8 server resides, you could consider:

Production Site (Site P)

Secondary Site (Site S)

Test Failover - The goal is to test DR for these 5 servers without impacting current operation.

  1. (Site P) If need the latest server changes, take manual Nimble snapshot with "replicate" option of the 5 volumes and wait for replication completes.

  2. (Site S) For each volume, locate the latest replicated snapshot and create a clone.

  3. (Site S) Associate appropriate ACL to the 5 newly cloned volumes.

  4. (Site S) Rescan ESXi, browse datstores, add VM to inventory and power on, verify functionality and etc.

  5. (Site S) Properly unmount the datastores and detach these 5 volumes from ESXi.

  6. (Site S) To cleanup, offline these 5 cloned volumes and delete them.

Planned Migration - The goal is to migrate these 5 servers to run on the Site S (without replicating back to Site P).

  1. (Site P) Properly shut down these 5 w2k8 servers. Take manual Nimble snapshot with "replicate" option of the 5 volumes and wait for replication completes.

  2. (Site P) Properly unmount the datastores and detach these 5 volumes from ESXi.

  3. (Site P) Offline these 5 volumes. Optionally set these 5 volumes read only to avoid accidental writes if the environment is not completely under controlled.

  4. (Site S) Locate the volume collection(s) for these 5 volumes. Promote all applicable volume collections.

  5. (Site S) Associate appropriate ACL to the 5 newly promoted volumes. (It's OK to swap the order of step #4 and #5 as well)

  6. (Site S) Rescan ESXi, browse datstores, add VM to inventory and power on.

Failover - When Site P is completely inaccessible, the goal is to bring up Site S

  1. Similar to Planned Migration, but use step #4 to step #6.

Planned Migration with Nimble handover - The goal is to migrate these 5 servers to run on the Site S, and replicate back to Site P

  1. (Site P) Properly shut down these 5 w2k8 servers. Take manual Nimble snapshot with "replicate" option of the 5 volumes and wait for replication completes.

  2. (Site P) Properly unmount the datastores and detach these 5 volumes from ESXi.

  3. (Site P) Locate the volume collection(s) for these 5 volumes. Handover all applicable volume collections.

  4. (Site S) Associate appropriate ACL to the 5 newly promoted volumes.

  5. (Site S) Rescan ESXi, browse datstores, add VM to inventory and power on.

The "Test Failover" is probably what you are looking for. As you pointed out, cloned volumes can be used for the test failover purpose. This can be achieved without bringing down the production systems. This is also how VMware SRM performs the test failover with Nimble SRA. Please let us know if there's any question and how it goes.

ej7257
Occasional Contributor

Re: Proof of concept DR test

This is incredible, thank you so much!

ej7257
Occasional Contributor

Re: Proof of concept DR test

How exactly does one take a snapshot "with replicate option"?

I don't see that on our Nimble.

rugby0134
Esteemed Contributor

Re: Proof of concept DR test

Under your protection policy or volume collection. you setup a snapshot, and then choose a replication partner (must be setup already) and number to retain on replication partner.

jwang131
Occasional Visitor

Re: Proof of concept DR test

As Kevin pointed out, the option is under volume collection. When click "Take Snapshot Collection", there will be a selection box for replication (replicate).

cbrasga24
Occasional Advisor

Re: Proof of concept DR test

I would just recommend using the Nimble vSphere plugin to perform the snapshots and clones of the datastores to prevent ID conflicts and it also saves time by performing the rescans across all hosts in the cluster.

marktheblue45
Valued Contributor

Re: Proof of concept DR test

If you are proving to the client/customer that failover to DR will work then the customer/client will in some cases want to test post failover to DR. Therefore to avoid any data loss caused by the test DR VMs being written to while running on the DR array I'd go for the "Planned Migration with Nimble handover" since doing the test in this way ensures any new data will be present after Failback.

paul_shane
Occasional Visitor

Re: Proof of concept DR test

This is a very helpful thread.  However, I would like to test a failover and failback.  I would like to take a test volume which has a vmware file server and essentially follow the "Test Failover" instructions above.  However I would also like to simulate how this server would be failed back (or replicated back to production). Keep in mind that this is just a test and i will keep the production site with its other production volumes running during these tests.  I simply want to take a test VMware file server and run through the paces so I can document and test on an annual basis.  Any thoughts on how to do this?