HPE SimpliVity
cancel
Showing results for 
Search instead for 
Did you mean: 

Help with disaster recovery

 
SOLVED
Go to solution
Highlighted
Advisor

Help with disaster recovery

We are planning to upgrade our 2+1 Omnistack environment from Omnistack 3.7.7 to Omnistack 4.0.0.

This led my boss to ask me about a few disaster recovery scenarios and how we would recover from them.

We have 2 hosts in our Production cluster (HA compliant) and 1 host in our Backup cluster which is in a remote location and only serves as a receiver of regular backups from the Production cluster, and a few non-critical VMs.

Disaster scenario 1: A firmware upgrade goes wrong and 1 Production host goes down. That should still be fine because we have enough capacity to serve all VMs on one host.

Disaster scenario 2: The Omnistack upgrade goes wrong for some reason, and the entire Simplivity federation becomes unresponsive. I guess the ESXi hosts should still be responsive and keep serving the VMs and not cause any immidiate downtime? We would then have to reinstall Omnistack with the help of HPE, is my best guess.

Disaster scenario 3 however leaves me with a few questions: If both hosts in the Production cluster go down (maybe there's a fire in the data center). I'm not sure how we would best recover our VMs? There will obviously be downtime, since both our Production hosts with all VMs and VCSA just went up in smokes. First, we would need to buy new hardware.

But would we then need to redeploy Simplivity on the new hosts before we're able to recover any backups? Do we need to redeploy Simplivity, reconnect the new hosts to the existing Backup node federation, redeploy VCSA and the Simplivity plugin, and then we can start restoring backups from VCSA?

Do we redeploy Simplivity completely, or might we then lose the backups stored on the Backup node?

How do we best assure the safety of our VM backups in the case of a catastrophic scenario like this, and how do we best approach the problem to get our production environment back?

Is there maybe a guide on a similar disaster scenario to help me understand how an operation like this would be performed?

We regularly backup our ESXi-configurations, VCSA-configuration and iLo-configurations to non-Simplivity storage.

Any suggestions or hints are helpful.

4 REPLIES 4
Highlighted
HPE Pro
Solution

Re: Help with disaster recovery

Hi @guan8,

Thank you for using the forum.

Disaster scenario 1: A firmware upgrade goes wrong and 1 Production host goes down. That should still be fine because we have enough capacity to serve all VMs on one host.

This statement is correct. During an FW/ESXi upgrade (which can be done via our Upgrade Manager) involves a safe shutdown of the OVC & ESXi. All VM's will be vMotioned to the alternative node and remain powered on throughout.

Disaster scenario 2: The Omnistack upgrade goes wrong for some reason, and the entire Simplivity federation becomes unresponsive. I guess the ESXi hosts should still be responsive and keep serving the VMs and not cause any immidiate downtime? We would then have to reinstall Omnistack with the help of HPE, is my best guess.

It is extremely rare that an Omnistack upgrade would cause the system to go entirely unresponsive and require a re-install. The way our upgrade process works is: You upgrade the first node in the cluster, once successful and HA sync has been acheived the second node can then be upgraded. Once all nodes have been successfully upgraded, and only then, the cluster can then be committed. This steps ensures that the software has been installed correctly, if not you can "rollback" to the previous version without issues.

Disaster scenario 3 however leaves me with a few questions: If both hosts in the Production cluster go down (maybe there's a fire in the data center).

This is where the third off site node comes into play. Provided you are sending backups from the production cluster to the off site cluster, you will be able to restore the VM's immediately on that third node. That is also dependant on space availability on the third node.

With regards to the two nodes that have been destroyed, they will need new hardware, new re-install and would essentially be brand new blank nodes. Data can then be Migrated back across the wire to these nodes from your off-site node.

If you want need to go more in-depth then that we would be happy to assist further. Let me know.

Regards,

David

I am a HPE SimpliVity Employee


I work for HPEAccept or Kudo
Highlighted
Advisor

Re: Help with disaster recovery

Thank you for your prompt reply @dhooley 

 

Disaster scenario 1: A firmware upgrade goes wrong and 1 Production host goes down. That should still be fine because we have enough capacity to serve all VMs on one host.

This statement is correct. During an FW/ESXi upgrade (which can be done via our Upgrade Manager) involves a safe shutdown of the OVC & ESXi. All VM's will be vMotioned to the alternative node and remain powered on throughout.

Is it also OK to upgrade firmware via iLo? I believe we did that last time. Also, if a firmware upgrade goes bad, how do we go about rolling back the upgrade or restoring a previous firmware version? Is that possible?

/Gustav

Highlighted
HPE Pro

Re: Help with disaster recovery

Hi @guan8,

Yes of course you may still upgrade the FW the conventional method via the ILO. Upgrade Manager simply provides an alternative method and allows everything to be done via the one application.

If you wanted to carry this out via ILO, steps for this are dependant on the HW & ILO version but are all very similar and can be found easily enough via a google search.

A rollback of FW can also be carried out by simply applying the previous FW patch in the same method you would use above to upgrade.

Hope this answers your queries.

Regards,

David

I am a HPE SimpliVity Employee

 


I work for HPEAccept or Kudo
Highlighted
Advisor

Re: Help with disaster recovery

Awesome, thanks for your help!

/Gustav