Disaster
Recovery/Business Resumption Planning Guidelines
Table of Contents
Introduction
Statutory Authority
Scope
Exemptions
Guidelines
The Recovery Planning Process
Project Planning
Critical Business Requirements
Recovery Strategies
Emergency Response/Problem Escalation
Plan Activation
Recovery Operations
Training
Testing
Plan Maintenance
Related Policy and Standards
Maintenance
Definitions
Introduction
The purpose of disaster recovery/business resumption planning is to assure
continuity of computing and telecommunications operations needed to support
critical agency functions. The business resumption plan should aim at achieving
a systematic and orderly resumption of all agency computing and
telecommunications services. The plan should provide for restoring service as
soon as possible. Those functions that are most critical to achieving the
agency mission must remain in operation during the recovery period.
Statutory Authority
The provisions of RCW 43.105.041 detail the powers and duties of the ISB,
including the authority to develop statewide or interagency information
services and technical policies, standards and procedures.
Scope
These guidelines apply to all executive and judicial branch agencies and
educational institutions, as provided by law, that operate, manage, or use IT
services or equipment to support critical state business functions.
Exemptions
None.
Guidelines
Emergency response/problem escalation procedures prescribe how to respond
to two kinds of situation:
Disaster recovery/business
resumption plans should specify procedures for both situations. Emergency
procedures direct the response to disaster events. Escalation procedures direct
the response to problems. Both sets of procedures may result in the declaration
of a disaster and activation of the recovery plan.
The Recovery Planning Process
There are nine major phases in the recovery planning process:
The disaster
recovery/business resumption planning process provides the preparation
necessary to design and document the procedures needed to assure continued
agency operations following a disaster. Each agency's process should include
the following elements:
Project Planning
Get preliminary management commitment.
Get agreement from senior management on the need for disaster recovery/business
resumption planning.
Designate a disaster recovery/business resumption
manager.
Designate a person to manage the agency's recovery from a disaster. The
designated individual must have sufficient knowledge of information management
and information technology (IT) within the agency in order to work effectively
with IT hardware and software, the data centers, and service providers in
reestablishing information processing and telecommunications services after a
disaster has occurred.
Organize a disaster recovery/business resumption
planning team.
Organize a team that will be responsible for the detailed technical
analysis and planning functions needed for a recovery plan.
Identify individuals from management, data processing, telecommunications,
business operating units, and consultants to participate in preparing the
disaster recovery/business resumption plan.
Audit current recovery preparedness.
Determine what security/disaster recovery/business resumption plans are in
place. Identify what planning remains to be done.
Develop the project schedule.
Estimate task durations, identify responsibilities, assign resources, and
document the schedule for plan development.
Specify documentation procedures.
Define recovery program overview.
Identify the scope and aim of the disaster
recovery/business resumption plan.
Critical Business Requirements
An agency may carry out hundreds of operations that management and
staff consider important. Key resources may be unavailable during a disaster.
The agency must concentrate its resources on the operations that are most
important for public health, safety, and welfare. The aim of a disaster
recovery/business resumption plan is to reduce potential losses, not to
duplicate a business-as-usual environment.
The following categorization is suggested as a means for classifying computer
application systems used by an agency:
Category/Classification
Natural hazards:
Accidents:
Environmental failure:
Intentional acts:
Recovery Strategies
Off-site storage of back-up
material.
15. Select off-site storage locations.
16. Determine off-site storage inventory. Identify
specific files, programs, documentation, vendor contracts, supplies, etc.
(copies of which should be stored and maintained off-site.) Agencies shall
include at least one current copy of their disaster recovery/business
resumption plan in the off-site storage inventory.
17. Specify off-site inventory procedures. Determine
procedures, schedules, and responsibility for maintaining the contents of the
off-site storage facility.
18. Alternate processing capability.
19. Select recovery facilities.
20. Document overall recovery strategy.
Emergency Response/Problem Escalation
Identify potential threats and develop emergency procedures.
Document the action steps to be taken immediately in responding to damaging
events or threats of damage or disruption. Inform all agency staff of
documented action steps.
21. The purpose of emergency procedures is to:
22. Document the emergency response actions the agency
must take immediately to:
23. Specify problem escalation guidelines.
Plan Activation
Develop first alert procedures.
24. Prepare general guidelines for initial notification
of a potential disaster situation.
25. Develop disaster confirmation procedures.
26. Develop notification procedures.
27. Develop procedures for declaring a disaster, for
setting up a command center, and for informing the recovery teams, customers,
the public, and suppliers.
28. Determine plan activation flow.
29. Outline or chart the steps to follow when a disaster
situation has occurred or potentially may occur.
30. Define recovery team organization.
31. Determine the teams that make up the recovery
organization.
32. Develop team action plans. There may be several
recovery teams, each specializing in some area of technical expertise. Disaster
Recovery/Business Resumption Team procedures for each team should use a format
like the following:
Team Charter or Function: The particular duties and responsibilities of
this team in the event of a disaster.
Team Membership and Organization: The structure of the team, job titles
of team members, reporting responsibilities.
Team Interfaces: Include detailed explanations of all the actions to be
taken by this team prior to a disaster situation so it can function
effectively, with the necessary data, personnel and other resources, if a
disaster occurs. This section should cover relationships with vendors,
customers, ongoing tasks to ensure readiness of the plan, training
requirements, identification of critical resources, data, and personnel.
Action Procedures: This section provides an outline of the tasks to be
carried out. It is written with the assumption that team members know how to do
their jobs and just need a guide to ensure nothing is omitted during the normal
confusion that will occur in the situation.
Procedures should be designed to be flexible in order to permit their use in
varying types and degrees of contingency situation.
Procedures should be detailed enough to permit dependency upon them when no
other documentation or knowledge is available.
Plan Appendices: The appendices should contain the material and data
that will be used in the event of an actual disaster. Include separate
appendices on notification of personnel, resource requirements, forms and
documentation, and any other subjects that are required. The requirement is based
upon the ability of the particular team to access the information during a
disaster. If the data may not be otherwise available, it should be included in
the appendix to the disaster recovery/business resumption plan.
Training
Design a disaster recovery/business resumption training program.
33. Specify the aim, training activities, schedule, and
an administrator for disaster recovery/business resumption training.
34. Develop specific training activities.
35. Develop an instructional plan for each training
activity.
36. Develop training evaluation tools.
37. Develop techniques aimed at answering the following
questions:
Testing
Testing is the only method to ensure that:
38. Recovery procedures are complete and workable.
39. Materials and computer files are available and can be
used for alternate processing of critical operations and applications.
40. Backup copies of software, documentation, and
work-in-process records are adequate and current.
41. Training of personnel was effective.
Design a recovery plan testing program.
42. Detail:
Specify tests and assign responsibility for overseeing testing. Agencies using
external services shall plan, schedule, and conduct their disaster
recovery/business resumption plan testing in cooperation with service
providers. The cost of establishing the necessary communication link and
running a test at a remote back-up facility is high. A full test involving all
agency applications may well be impractical due to budget considerations.
Agencies should plan to share test time at the service provider's back-up
facility ("hot site").
43. Objectives:
Clearly state the purposes for conducting tests of the recovery plan. These
will include aims such as the following:
44. Policy/Guidelines: Set up the policies and guidelines that will apply to testing of the
recovery plan. These will cover such items as the following:
45. The testing or validation methodology adopted by an
agency will depend on:
46. The test report should include:
47. Distribution list for test reports must include:
48. User notification.
49. Specification of tests. Formulate a test schedule. For each test, specify
the level of the test, the scope or areas to test, and the frequency or target
date of the test.
50. Levels of testing:
51. Develop plans for specific tests.
Plan Maintenance
Assign plan maintenance responsibility.
Establish maintenance procedures and schedules. Provide a schedule for regular,
systematic review of the content of the disaster recovery/business resumption
plan. Define a procedure for making appropriate changes to the plan.
Develop
distribution procedures and lists.
52. Provide policies and procedures for distributing the
recovery plan parts and updates.
53. The disaster recovery/business resumption plan may
contain sensitive information about the agency's business, communications, and
computing operations. Policy and procedures for distribution of the plan should
take this into account.
NOTE: DP/90 PLUS, a product
of SunGard Recovery Services, is an MS-DOS software application that provides
substantial help in the development and maintenance of a disaster recovery/business
resumption plan. DIS has a corporate contract for this product. Because of this
special contract, DP/90 PLUS is available to any state agency at a discount.
Please contact DIS Leasing & Brokering for order information.
Maintenance
Technological advances and changes in the business requirements of agencies
will necessitate periodic revisions to policies, standards, and guidelines. The
Department of Information Services is responsible for routine maintenance of
these to keep them current. Major policy changes will require the approval of
the ISB.
Definitions
Catastrophic Disaster: A catastrophic disaster is one in which the outage
will probably last more than seven days.
Damage - Damage due to a catastrophic disaster is severe and could
involve total destruction of the agency facility. Replacement of equipment or
significant renovation of the facility may be necessary.
Command Center: The command center is a local, on or off premise area,
from which to manage the emergency situation. It is a focal point for
coordinating the recovery program, issuing information, and assembling
personnel.
Critical Function: Critical functions are those functions an agency must
perform to survive. Failure to perform them would result in serious or
irreparable harm to the agency. Impact may take the form of increased operating
costs, loss of revenue collection, or inability to provide services to clients.
Disaster: Any unplanned circumstance or event that results in an
inability to support critical business functions within the current
environment.
Disaster Recovery/Business Resumption Plan: A disaster recovery/business
resumption plan is a comprehensive statement of actions to be taken in response
to a disaster. It includes documented, tested procedures that, if followed,
will assure the availability of the critical resources and facilities required
to maintain continuity of operations. Sync.: Contingency Plan, Disaster
Recovery Plan, Business Continuity Plan.
Major Disaster: A major disaster is one in which the outage will
probably last from two to seven days.
Damage - Damage due to a major disaster is more severe than that due to
a minor disaster. For example: in a major disaster, key business units could be
without telecommunications capability for an extended period. Or the computer
room could suffer heavy damage.
Minor Disaster: A minor disaster is one in which the outage will
probably last longer than one shift, but less than two days.
Damage - Damage due to a minor disaster is comparatively light. It may
consist of minor damage to hardware, software, or electrical equipment from
fire, water, chemicals, etc.
Recovery Teams: Recovery teams are manageable units having common
recovery requirements. The recovery teams will very likely parallel an existing
agency departmental organization.