Disaster Recovery/Business Resumption Planning Guidelines
 




Table of Contents
 
Introduction

Statutory Authority

Scope

Exemptions

Guidelines

     The Recovery Planning Process

     Project Planning

     Critical Business Requirements

     Recovery Strategies

     Emergency Response/Problem Escalation

     Plan Activation

     Recovery Operations

     Training

     Testing

     Plan Maintenance

Related Policy and Standards

Maintenance

Definitions

Introduction
The purpose of disaster recovery/business resumption planning is to assure continuity of computing and telecommunications operations needed to support critical agency functions. The business resumption plan should aim at achieving a systematic and orderly resumption of all agency computing and telecommunications services. The plan should provide for restoring service as soon as possible. Those functions that are most critical to achieving the agency mission must remain in operation during the recovery period.

Statutory Authority
The provisions of RCW 43.105.041 detail the powers and duties of the ISB, including the authority to develop statewide or interagency information services and technical policies, standards and procedures.

Scope
These guidelines apply to all executive and judicial branch agencies and educational institutions, as provided by law, that operate, manage, or use IT services or equipment to support critical state business functions.

Exemptions
None.

Guidelines
Emergency response/problem escalation procedures prescribe how to respond to two kinds of situation:

Disaster recovery/business resumption plans should specify procedures for both situations. Emergency procedures direct the response to disaster events. Escalation procedures direct the response to problems. Both sets of procedures may result in the declaration of a disaster and activation of the recovery plan.

The Recovery Planning Process
There are nine major phases in the recovery planning process:

  1. Project Planning: Define the project scope, organize the project, and identify the resources needed.
  2. Critical Business Requirements: Identify the business functions most important to protect, and the means to protect them. Analyze risks, threats, and vulnerabilities.
  3. Recovery Strategies: Arrange for alternate processing facilities to use during a disaster. Make sure to store copies of computer files, work-in-process, software, and documentation in a safe place.
  4. Emergency Response/Problem Escalation: Specify exactly how to respond to emergencies and how to tell when a "problem" has become a potential "disaster."
  5. Plan Activation: Determine procedures for informing the right people, assessing the impact on operations, and starting the recovery efforts.
  6. Recovery Operations: Develop the specific steps for reducing the risks of an outage and restoring operations should an outage occur.
  7. Training: Make sure everyone understands the recovery plan and can carry it out efficiently.
  8. Testing: Make sure the plan works effectively.
  9. Plan Maintenance: Make changes and additions to keep the plan current.

The disaster recovery/business resumption planning process provides the preparation necessary to design and document the procedures needed to assure continued agency operations following a disaster. Each agency's process should include the following elements:

Project Planning
Get preliminary management commitment
.
Get agreement from senior management on the need for disaster recovery/business resumption planning.

Designate a disaster recovery/business resumption manager.
Designate a person to manage the agency's recovery from a disaster. The designated individual must have sufficient knowledge of information management and information technology (IT) within the agency in order to work effectively with IT hardware and software, the data centers, and service providers in reestablishing information processing and telecommunications services after a disaster has occurred.

Organize a disaster recovery/business resumption planning team.
Organize a team that will be responsible for the detailed technical analysis and planning functions needed for a recovery plan.

Identify individuals from management, data processing, telecommunications, business operating units, and consultants to participate in preparing the disaster recovery/business resumption plan.

Audit current recovery preparedness.
Determine what security/disaster recovery/business resumption plans are in place. Identify what planning remains to be done.

Develop the project schedule.
Estimate task durations, identify responsibilities, assign resources, and document the schedule for plan development.

Specify documentation procedures.

Define recovery program overview.

Identify the scope and aim of the disaster recovery/business resumption plan.

Critical Business Requirements
An agency may carry out hundreds of operations that management and staff consider important. Key resources may be unavailable during a disaster. The agency must concentrate its resources on the operations that are most important for public health, safety, and welfare. The aim of a disaster recovery/business resumption plan is to reduce potential losses, not to duplicate a business-as-usual environment.

  1. Perform business impact analysis. Establish an understanding of the business organization and service areas of the agency.
  2. Identify the business functions to be addressed in accomplishing a business impact analysis.


The following categorization is suggested as a means for classifying computer application systems used by an agency:

Category/Classification

Natural hazards:

Accidents:

Environmental failure:

Intentional acts:


Recovery Strategies
Off-site storage of back-up material.

15. Select off-site storage locations.

      • Identify one or more locations off-site for secure storage of copies of data, documentation, and critical supplies.
      • Agencies that purchase computer services from external providers should arrange with the service provider for off-site storage.

16. Determine off-site storage inventory. Identify specific files, programs, documentation, vendor contracts, supplies, etc. (copies of which should be stored and maintained off-site.) Agencies shall include at least one current copy of their disaster recovery/business resumption plan in the off-site storage inventory.

17. Specify off-site inventory procedures. Determine procedures, schedules, and responsibility for maintaining the contents of the off-site storage facility.

18. Alternate processing capability.

      • Identify requirements for recovery facilities.
      • Determine hardware processing capacity, phone service, data communications service, furniture, and space needed in an alternate processing facility.

19. Select recovery facilities.

      • Rank potential recovery alternatives and select one or more.
      • Produce recovery site procedures guide(s).
      • Document information needed to use at each recovery facility.

20. Document overall recovery strategy.

      • Document the general strategy the agency will use in the event of a disaster.
      • The recovery strategy is an overview of the recovery process the organization will follow if hit by a disaster. The strategy should address:
      • Recovery requirements for restoration of critical business operations.
      • Any alternate processing facilities employed.
      • Any alternate manual procedures, forms, staffing, and space.
      • Procedures for obtaining resources.
      • Agencies should also develop strategies for addressing each of the following where relevant:
      • Command centers
      • Alternate business operations
      • Alternate data processing
      • Alternate data communications
      • Alternate voice communications
      • Recovery resource acquisition.


Emergency Response/Problem Escalation
Identify potential threats and develop emergency procedures.

Document the action steps to be taken immediately in responding to damaging events or threats of damage or disruption. Inform all agency staff of documented action steps.

21. The purpose of emergency procedures is to:

      • Protect people.
      • Protect property.
      • Reduce outage duration or loss of IT services or assets.

22. Document the emergency response actions the agency must take immediately to:

      • Protect the lives and safety of all personnel.
      • Gain immediate emergency help from fire, police, hospitals.
      • Reduce outage duration or loss of IT services or assets.
      • Inform agency staff who are members of a Disaster Recovery/Business Resumption Management Team that a serious loss or interruption in service has occurred.
      • Set up a focal point for coordinating the recovery program, sending out information, and assembling personnel.

23. Specify problem escalation guidelines.

      • State the steps to follow for escalating unresolved problems to disaster status.
      • The purpose of problem escalation procedures is to define the steps and time allotments leading up to the declaration of a disaster.


Plan Activation
Develop first alert procedures.

24. Prepare general guidelines for initial notification of a potential disaster situation.

25. Develop disaster confirmation procedures.

      • Develop procedures to manage the initial assessment of a disaster or potential disaster situation.
        • Develop procedures for reporting findings to management.
        • Develop procedures for making initial emergency contacts.
        • Develop procedures for possible command center activation.
      • Develop damage assessment procedures.
        • Develop procedures for damage assessment.
        • Develop procedures for examining the effect of the damage on processing of critical operations.

26. Develop notification procedures.

27. Develop procedures for declaring a disaster, for setting up a command center, and for informing the recovery teams, customers, the public, and suppliers.


Recovery Operations

28. Determine plan activation flow.

29. Outline or chart the steps to follow when a disaster situation has occurred or potentially may occur.

30. Define recovery team organization.

31. Determine the teams that make up the recovery organization.

32. Develop team action plans. There may be several recovery teams, each specializing in some area of technical expertise. Disaster Recovery/Business Resumption Team procedures for each team should use a format like the following:


Team Charter or Function: The particular duties and responsibilities of this team in the event of a disaster.

Team Membership and Organization: The structure of the team, job titles of team members, reporting responsibilities.

Team Interfaces: Include detailed explanations of all the actions to be taken by this team prior to a disaster situation so it can function effectively, with the necessary data, personnel and other resources, if a disaster occurs. This section should cover relationships with vendors, customers, ongoing tasks to ensure readiness of the plan, training requirements, identification of critical resources, data, and personnel.

Action Procedures: This section provides an outline of the tasks to be carried out. It is written with the assumption that team members know how to do their jobs and just need a guide to ensure nothing is omitted during the normal confusion that will occur in the situation.

Procedures should be designed to be flexible in order to permit their use in varying types and degrees of contingency situation.

Procedures should be detailed enough to permit dependency upon them when no other documentation or knowledge is available.

Plan Appendices: The appendices should contain the material and data that will be used in the event of an actual disaster. Include separate appendices on notification of personnel, resource requirements, forms and documentation, and any other subjects that are required. The requirement is based upon the ability of the particular team to access the information during a disaster. If the data may not be otherwise available, it should be included in the appendix to the disaster recovery/business resumption plan.

Training
Design a disaster recovery/business resumption training program.

33. Specify the aim, training activities, schedule, and an administrator for disaster recovery/business resumption training.

34. Develop specific training activities.

35. Develop an instructional plan for each training activity.

36. Develop training evaluation tools.

37. Develop techniques aimed at answering the following questions:

      • Are trainees able to perform their recovery responsibilities?
      • How can the agency improve training?
      • How can the agency improve its disaster recovery/business resumption plan?


Testing
Testing is the only method to ensure that:

38. Recovery procedures are complete and workable.

39. Materials and computer files are available and can be used for alternate processing of critical operations and applications.

40. Backup copies of software, documentation, and work-in-process records are adequate and current.

41. Training of personnel was effective.


Design a recovery plan testing program.

42. Detail: Specify tests and assign responsibility for overseeing testing. Agencies using external services shall plan, schedule, and conduct their disaster recovery/business resumption plan testing in cooperation with service providers. The cost of establishing the necessary communication link and running a test at a remote back-up facility is high. A full test involving all agency applications may well be impractical due to budget considerations. Agencies should plan to share test time at the service provider's back-up facility ("hot site").

43. Objectives: Clearly state the purposes for conducting tests of the recovery plan. These will include aims such as the following:

      • A disaster recovery/business resumption plan is complete and workable.
      • Identifying needed revisions to disaster recovery/business resumption plan.
      • Determine the adequacy of disaster recovery/business resumption training.
      • Identifying needed revisions to the training program.

 

44. Policy/Guidelines: Set up the policies and guidelines that will apply to testing of the recovery plan. These will cover such items as the following:

      • Committing the agency to a minimum level of testing.
      • Basing the frequency of plan testing on the frequency of changes in the business environment. Agencies must conduct at least one test per year.

 

45. The testing or validation methodology adopted by an agency will depend on:

      • Criticality of agency business functions.
      • Cost of executing the test plan.
      • Budget availability.
      • Complexity of information system and components.
      • Reporting requirements.

 

46. The test report should include:

      • Date of test.
      • Objectives of test.
      • Description of test.
      • Results.
      • Recommendations.

 

47. Distribution list for test reports must include:

      • DIS.
      • Service provider if computer services are obtained from a source external to the agency.

 

48. User notification.

      • Define requirements for informing users of planned tests.
      • Before conducting any testing that requires access to client information, inform the owning department. Get permission to test using the client data.

 

49. Specification of tests. Formulate a test schedule. For each test, specify the level of the test, the scope or areas to test, and the frequency or target date of the test.

50. Levels of testing:

      • Level I Þ   Adequacy of off-site storage of files and documentation. The purpose of the first level is the evaluation of the adequacy of the off-site storage facility and the existing recovery procedures. Primary concentration should be on the off-site files and documentation necessary for efficient system recovery.
      • Level II Þ   System restoration using off-site files and documentation on the in-house computers.
        The purpose of the second level is to evaluate recovery of the ability to operate. Primary concentration should be on off-site files and documentation of the operating system, as well as management control of the recovery process.
      • Level III Þ   System and communications restoration using alternate processing facilities, off-site files and documentation.
        The purpose of the third level is to evaluate recovery capability at an alternate site with a reduced staff.

 

51. Develop plans for specific tests.

      • Develop test evaluation tools.
      • Develop forms, checklists, and debriefing strategies to check recovery plan tests.


Plan Maintenance
Assign plan maintenance responsibility.

Establish maintenance procedures and schedules. Provide a schedule for regular, systematic review of the content of the disaster recovery/business resumption plan. Define a procedure for making appropriate changes to the plan.

Develop distribution procedures and lists.

52. Provide policies and procedures for distributing the recovery plan parts and updates.

53. The disaster recovery/business resumption plan may contain sensitive information about the agency's business, communications, and computing operations. Policy and procedures for distribution of the plan should take this into account.

NOTE: DP/90 PLUS, a product of SunGard Recovery Services, is an MS-DOS software application that provides substantial help in the development and maintenance of a disaster recovery/business resumption plan. DIS has a corporate contract for this product. Because of this special contract, DP/90 PLUS is available to any state agency at a discount. Please contact DIS Leasing & Brokering for order information.

Maintenance
Technological advances and changes in the business requirements of agencies will necessitate periodic revisions to policies, standards, and guidelines. The Department of Information Services is responsible for routine maintenance of these to keep them current. Major policy changes will require the approval of the ISB.

Definitions
Catastrophic Disaster
: A catastrophic disaster is one in which the outage will probably last more than seven days.

Damage - Damage due to a catastrophic disaster is severe and could involve total destruction of the agency facility. Replacement of equipment or significant renovation of the facility may be necessary.

Command Center: The command center is a local, on or off premise area, from which to manage the emergency situation. It is a focal point for coordinating the recovery program, issuing information, and assembling personnel.

Critical Function: Critical functions are those functions an agency must perform to survive. Failure to perform them would result in serious or irreparable harm to the agency. Impact may take the form of increased operating costs, loss of revenue collection, or inability to provide services to clients.

Disaster: Any unplanned circumstance or event that results in an inability to support critical business functions within the current environment.

Disaster Recovery/Business Resumption Plan: A disaster recovery/business resumption plan is a comprehensive statement of actions to be taken in response to a disaster. It includes documented, tested procedures that, if followed, will assure the availability of the critical resources and facilities required to maintain continuity of operations. Sync.: Contingency Plan, Disaster Recovery Plan, Business Continuity Plan.

Major Disaster: A major disaster is one in which the outage will probably last from two to seven days.

Damage - Damage due to a major disaster is more severe than that due to a minor disaster. For example: in a major disaster, key business units could be without telecommunications capability for an extended period. Or the computer room could suffer heavy damage.

Minor Disaster: A minor disaster is one in which the outage will probably last longer than one shift, but less than two days.

Damage - Damage due to a minor disaster is comparatively light. It may consist of minor damage to hardware, software, or electrical equipment from fire, water, chemicals, etc.

Recovery Teams: Recovery teams are manageable units having common recovery requirements. The recovery teams will very likely parallel an existing agency departmental organization.