Planning
Showing results for 
Search instead for 
Do you mean 

Disaster Recovery plan - (detailed)

SOLVED
Go to Solution
Super Advisor

Disaster Recovery plan - (detailed)

I have a two node serviceguard cluster that I need to document a disaster recovery plan for. Does anyone have a plan (company specific info excluded) that they can share.

I have been reviewing the ignite recovery section and it is sort of high level. My current Tru64 DR plan is extremely detailed and is a step by step plan. What I see mainly in ignite recovery doc is, boot from the recovery tape, wait for the recovery to complete and when the system comes back up recover the latest copies of files. This is what I want from a high level but my plan must be documented down to the point of actually pulling the tape out of the DR box and putting it in the tape drive.


I will be going to a DR site with just tapes and I have never done DR on HP-UX. Must the hardware be identical to what I have here? I believe yes in order for make_tape_recovery to work. Does an OS have to be loaded by the DR site before I get there?


I need to document it to the nth degree. Any similar documentation that could point me in the right direction would be appreciated.
5 REPLIES
Respected Contributor

Re: Disaster Recovery plan - (detailed)

Mike I have attached the template for the DR plan that we have used successfully in the pass/present
Honored Contributor

Re: Disaster Recovery plan - (detailed)

Mike,

There were a number of D/R presentations at HPWorld 2003 and 2004. Most of those presentations were preserved to the encompassus.org website. The D/R presentations from the HPTechForum 2005, Orlando, require Encompass Membership to view them, unless you attended the event.

Chuck Ciesinski
"Show me the $$$$$"
Trusted Contributor

Re: Disaster Recovery plan - (detailed)

Hi Mike
I hope you find some useful comments.

In our company we also have mcsg clusters and very good redundancy and failover capabilities (as well as ups/diesel capacity).

Apart from this we also have a contract with HP for DR.

To ensure that they have adequate SW/HW to restore our systems we use automated system reports which gather system information and send it by mail once a month to the DR centre. We also send if we do major changes.
It doesn't need to be entirely identical, but it must be sufficient and appropriate. Remember that you will have a changing environment.
We also do a DR rehearsal once a year, where tapes are brought to the DR centre, systems restored, network connections made, and a group of test personnel logon to the systems and perform checks.

We have had this arrangement in approx. 10 yrs. and have once identified that the DR centre didn't have the required HW.
(one of the issues was incorrect tape drives, as we had upgraded from DLT to LTO).
In our contract we have to ensure deliverance of system change information - and as stated it is automated...
This resulted in a payback as it was a contract breach.

Our backup tape libraries is located in one building, while the machines are in another.
They are connected via fibre.
We bring backup tapes in locked suitcases to an external security company on daily basis.
We can trigger sending of tapes via courier to the DR centre (it's in another european country).

--- Since it's fairly big systems and the fact that we have to use courier, our recovery time is (less than) 72hours.

For the organisation to survive this timeframe, we have Emergency Operating Procedures (EOP).
This is based on reports that generate files with crucial business information.
These files are transfered from our datacentre to each local site and stored on local fileservers.
In case of a disaster, these reports can be printed or used by programs to retrieve necessary information for manual business handling.

We have had failovers and other incidents but never had incidents as big as to trigger a DR or usage of EOP's.
Nevertheless we keep on with both DR contracts as well as rehearsals.
We also do internal checks which we call "armageddon tests" :-) to identify weaknesses.
A random site is selected and computer room "isolated" (could also be the datacentre).
---
Related to documentation:
---
Our documentation start within the organisations policy and guidelines.
It is the organisation that specify the business needs.
We have some overall requirements, and then drilldown within each business area/dept. what is needed.
When a disaster occurs, it is important that the entire organisation knows what to do and when to do it...

For the IT you "simply" just have to think about the details on how to recover the systems from Ignite and other backups, as well as re-routing the network traffic...
After you have documented it - test it.
After you have tested it - have another one to test it (you might have overlooked something another one would struggle with)...

P.S. Comment on DR procedure from David.
I do not like that HW/SW is detailed described in the DR procedure. Such documents might have a tendency to not be revised as frequently as needed...

Regards
Tor-Arne
I'm trying to become President of the state I'm in...
Super Advisor

Re: Disaster Recovery plan - (detailed)

Thanks all for the replies. I personally appreciated the detailed plan. It is my job and part of my process to maintain documentation as well as hardware updates.

The document was very helpful but I still need information on a few things.

- What happens when the systems boots and my filesystems are not there? (just a few errors?)

- I don't trust SAM to configure my filesystems. When I first setup my system, SAM apparently had problems doing some steps on SAN drives. I would get to a certain point in SAM and then have to manually get past it. In that case I would rather do it all manually and know for sure that it works.

- Do I give the filesystem sizes and device file information to the DR support personnel to have already setup for me? (My Tru64 DR box has internal storage so it is pretty straightforward to setup. My hpux box uses SAN storage for all the application data) Anyone know any gotchas that I should know?

Exalted Contributor

Re: Disaster Recovery plan - (detailed)

Shalom,

Before the plan, know that you can specify the equipment in a DR center if what you have is modern and still available on the market. HP has performance centers that we used (prior job) for a DR test. We would have purchased new servers for our own DR center was the rationale for going to the center, whose purpose is to assist in selling hardware. Try before you buy.

Here are the details of a good DR plan.

1) Make regular Ignite make_tape_recovery and or make_net_recovery backups (I like both). Rotate these backups off site regularly. Same plan with data backups.

2) Have a special Ignite server in your shop to have updated install images on it for all machines. The images themselves, which can be quite large can go on shared storage. Make a full backup of this system and make sure its twin in the DR center is configured and updated. It will enable you to quickly decide which servers are going to be rebuilt, because all you need to do with Ignite is boot off the Ignite server.

3) Have detailed step by step reinstallation plans for your major software on paper or easily readable media so that whomever does the DR (don't assume it will be you, you could be in the disaster) that will assist in rebuiling systems.

4) Make a policy on documentation of administration changes and make sure the change logs are recovered quickly in and DR plan.

5) To do this job, you need Internet access and some kind of email. Account for it in your plan.

Practice.

Stuff to know.
make_tape_recovery will work on similar hardware. It does not have to be same. My DR test was done on rp5470 servers with radically different fiber channel connections and the images came off rp5450 servers. There was an extra reboot to adjust for hardware changes, but it was automated.

What must be the same is the tape drives. No compromise on that. Yes an LTO3 drive will recover and LTO2 tape and I've tested that scenario.

Last advice: If you have time, write that detailed plan.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
//Add this to "OnDomLoad" event