IT Operations Management (ITOM)
Showing results for 
Search instead for 
Do you mean 

When disaster strikes: How IT process automation helps you recover fast

‎07-30-2014 10:32 AM - edited ‎01-27-2015 08:25 PM

What do these things have in common?

  • Power failure
  • IT hardware failure
  • Network failure
  • IT software failure
  • Human error

 

There’s a high probability that at least one of those things will be the cause of a major IT disruption. According to a Forrester Research 2013 survey, one third of respondents declared a disaster in the past five years. The top five culprits are listed above. Now imagine what it would be like if you were out for 30-plus hours, like 1 in 5 survey respondents experienced.

 

Don’t be scared, be prepared

 

With the risks so high, IT organizations must implement and continuously test and update disaster recovery (DR) and business continuity (BC) procedures. But testing DR procedures is a time-consuming and resource-intensive task that involves multiple subject-matter experts (SMEs) from across IT. A typical test for a large organization can involve dozens of people on multiple conference calls for up to a full day.

 

Its little wonder why testing typically happens so infrequently. The Forrester survey revealed 39 percent of firms conduct a full test — a live or simulated failover of all infrastructure at a site — only once a year. In fact, DR procedures really should be tested every time a major change is implemented on an application.

 

Orchestrate your recovery

 

In many ways, IT process automation and orchestration is a perfect fit for DR procedure testing, reducing how many resources you expend and improving success rates.

 

Consider the characteristics of any disaster recovery or failover exercise:

  • Requires a number of tasks that need to be performed in a very specific sequence
  • Tasks often span a number of different IT domains — server, network, storage, and others
  • Tasks require a number of different SMEs, including network engineers, database administrators, server administrators, and others
  • Success depends on coordination and handoffs between these SMEs

 

IT process automation and orchestration makes all of this faster and easier. By creating workflows that tie together diverse tools, processes and domains, the risk of failure is significantly reduced. And because workflows capture and essentially document the process information, you also protect yourself from risks that key personnel or groups will be unavailable.

 

 FREE: The new HP Operations Orchestration Community Edition

                       

 

 

How OO workflows automate disaster recovery of an email system

Let’s look at an example of how HP Operations Orchestration drives efficiency and reduces errors by automating a number of repetitive and tedious tasks. Below is an HP OO workflow for automating the disaster recovery procedure for an email system:

 

 

Figure 1: Implementation of a disaster recovery process using HP OO

 

The HP OO workflow above can may be triggered when a change ticket declaring the DRP event is approved. Here are the steps it follows:

 

  1. The DR event is declared (real or test)
  2. Verify that the change requests in service desk systems (such as HP Service Manager) are approved
  3. Verify that network is operational
  4. Validate the health of the destination systems, including server and storage
  5. Verify that the configuration of the destination system is same as source system, including database (SQL Server), application servers (Exchange) and Web servers.
  6. Clone the destination server, if source and destination are not same
  7. Disable monitoring, clustering on the primary systems
  8. Perform failover tasks:
    1. Disconnect users and disable new connections
    2. Open connections into destination systems
    3. Reroute Domain Name Systems (DNSs) to point to destination systems
    4. Deactivate primary systems
  9. Validate the availability of service for the new system
  10. Update change request ticket in service desk system
  11. Update configuration management database (CMDB) with current status, view reports to verify that failover completed successfully
  12. Re-enable monitoring and clustering
  13. Notify users and stakeholders
  14. Declare DR event complete

 

There are also two automated sub-workflows built in at Step 6, for cloning the destination server, and Step 8, for the failover from source to destination:

 

 

Fig. 2: Sub-workflow for cloning destination server

 

 

Fig. 3: Sub-workflow for failover from source to destination

 

To manually conduct such a complex disaster recovery procedure would clearly require a significant amount of time and resources — and chances are that your organization would not get around to testing its effectiveness as often as it should.

 

Automating and orchestrating a large number of disaster-recovery tasks drives down the costs of performing critical disaster-recovery planning. Furthermore, the procedures are more reliable and ready for an actual recovery event. Institutionalizing disaster-recovery procedures in orchestration workflows also helps to communicate and document the procedure, and reduces how much you must depend on specific individuals or groups.

 

 

Experience HP Operations Orchestration NOW!

The new HP Operations Orchestration Community Edition is a free download of the platform with out-of-the-box content packs for automating incident remediation. Designed for easy self-installation, you will be able to begin experiencing within two hours the power of IT process automation and IT operations orchestration.

 

 

 

0 Kudos
About the Author

NimishShelat

Nimish Shelat is currently focused on Datacenter Automation and IT Process Automation solutions. Shelat strives to help customers, traditional IT and Cloud based IT, transform to Service Centric model. The scope of these solutions spans across server, network, database and middleware infrastructure. The solutions are optimized for tasks like provisioning, patching, compliance, remediation and processes like Self-healing Incidence Remediation and Rapid Service Fulfilment, Change Management and Disaster Recovery. Shelat has 23 years of experience in IT, 20 of these have been at HP spanning across networking, printing , storage and enterprise software businesses. Prior to his current role as a Manager of Product Marketing and Technical Marketing, Shelat has held positions as Software Sales Specialist, Product Manager, Business Strategist, Project Manager and Programmer Analyst. Shelat has a B.S in Computer Science. He has earned his MBA from University of California, Davis with a focus on Marketing and Finance.

Events
Each Month in 2016
Online
Software Expert Days - 2016
Join us online to talk directly with our Software experts during online Expert Days. Find information here about past, current, and upcoming Expert Da...
Read more
Sep 30
Seattle, WA
OpenStack Days Seattle
OpenStack Days Seattle, September 30, is the largest gathering of OpenStack users and prospective users in the Pacific Northwest region.
Read more
View all