Tech Insights

From DevOps to AIOps with the power of HPC: Why and how it’s time to make the move

Learn how AIOps uses AI, ML, and DL with the power of HPC to simplify IT operations management while accelerating and automating problem resolution in complex modern IT environments.

HPE-DevOps to AIOps-HPC- blog.jpg

Looking for something compelling to read in the new year? If you are interested in DevOps, The Phoenix Project: A Novel About IT, DevOps, and Helping your Business Win is a must-read book

This entertaining novel takes you from the very familiar chaos of daily IT operations to the three underlying principles of the DevOps movement: systems thinking, amplifying feedback loops and creating a culture of continuous experimentation and learning. Since the book’s publication in 2013, the DevOps movement has come a long way, but some of the challenges of operating and managing IT environments have been getting worse:

  • The complexity of enterprise IT environments is growing exponentially, with distributed edge, hybrid cloud, and multi cloud environments.
  • With cloud-native and microservices architectures, the ever-increasing number of different technological components involved in a business service makes troubleshooting unmanageable with traditional, manual approaches. As the number of moving parts grows, so does the time to diagnose issues.
  • With the adoption of CI/CD pipelines, the speed of change is also exponentially increasing. In 2009, Allspaw and Hammond gave their famous 10+ Deploys Per Day: Dev and Ops Cooperation at Flickr presentation.1 Today, companies like Amazon, Google and Netflix deploy code thousands of times per day.
  • Detection of issues is based on monitoring systems that are often siloed by technology layer (infrastructure, network, application) and/or by location (on-premises data centers, edge, private, and public clouds), each one subject to creating its own noise.

Artificial intelligence to the rescue

Because of the increased complexity and speed of change of IT environments, the amount of data that operations teams need to analyze to perform their work is also growing exponentially. These huge amounts of data are harder to manage, and harder to make sense of. This is where artificial intelligence (AI) comes into play—using machine learning (ML) and deep learning (DL) algorithms to process, analyze and gather insights from data.

AIOps is the term that describes the use of AI to simplify IT operations management and accelerate and automate problem resolution in complex modern IT environments. AIOps is not a replacement for DevOps, but an enhancement.

AIOps is about automating problem resolution and accelerating performance efficiency. It cuts through noise and identifies, troubleshoots, and resolves common issues within IT operations. It brings together data from diverse sources and performs a real-time analysis at source. It understands and analyzes historical as well as current data, linking anomalies and observed patterns to relevant events via ML. Finally, it initiates appropriate automation-driven action, which can yield uninterrupted improvements and fixes. Here’s a diagram that shows differences between the traditional approach and the AIOps approach:

HPE AIOps vs traditional AIOps.png

AIOps use cases can help IT organizations with many current challenges, such as:

  • Reducing the noise and prioritizing business-critical issues, by correlating events with the same root-cause and eliminating duplicates
  • Reducing mean time to detect (MTTD), by automatically identifying trends that point to impending issues with real-time anomaly detection
  • Reducing mean time to repair (MTTR) with root-cause analysis
  • Predicting workload capacity requirements to optimize resource usage and cost
  • Supporting the speed of application releases and DevOps processes by gaining visibility of change impact with unified dashboards for services monitoring

Key components for AIOps solutions

The two main components for AIOps solutions are:

  • Data platform—AIOps requires a scalable data platform that allows the ingestion, storage, and analysis of the variety, velocity, and volume of data generated by IT, at the right speed for each use case, without creating silos.
  • Machine learning models and management framework—Different types of supervised, semi-supervised, and unsupervised ML and DL models can be involved in AIOps use cases. For example, trend identification with ML models enable real-time anomaly detection and predictive capabilities. But model building is just a step of the ML model lifecycle, which also needs to consider data preparation, model training, model deployment, model monitoring, and model retraining/redeployment.

To dig deeper, download this technical white paper: Artificial Intelligence for IT Operations—A study on high-performance compute.

Case Study: AIOps for HPC systems

Today, HPE is using AI/ML to develop advanced, non-threshold-based real-time analytics to reduce data center downtime via rapid and early anomaly detection that performs at scale, speed, and automatically. HPE is also developing predictive capabilities to improve data center energy efficiency and sustainability with initial focus on power usage effectiveness (PUE), predictive scheduling of cooling for large jobs, water usage effectiveness (WUE), carbon usage effectiveness (CUE), and such. The effort encompasses both IT systems and the supporting facility.

To support this AIOps initiative, HPE is also developing a generic high-performance system monitoring framework. This next-generation system monitoring framework for HPC machines, called Kraken Mare, was developed under the HPE/DOE PathForward2 project. It is designed to collect, move, and store vast amounts of data without any assumption of static data sources in a distributed, highly scalable way, and to provide different access patterns (such as streaming analytics or traditional data analysis using long-term storage) in a fault-tolerant way.

Following a 3-step methodology to speed AIOps design and deployment

Wherever you are on your AIOps journey, the expert AI consultants with HPE Pointnext uses a three-point methodology to guide you through your specific use case. The focus is on operations problem-solving and improvements.

  1. Explore—We work with you to understand the outcomes and challenges AI brings. We ground teams on common AI terminology, fostering shared understanding and selecting the best use cases. The goal is to clearly align technology with the business, so the initiative benefits from having the business buy-in early on.
  2. Experience—We identify the data sources that will be required for the use case and create a high-level roadmap for use case implementation. This is followed by a proof-of-value (POV) to demonstrate how the solution would be deployed into a production environment. This POV is tested and the outcome is validated.
  3. Evolve—Now we are ready to work with you to evolve and scale the AI solution. Leveraging HPE’s optimized data center infrastructure that spans from AI edge to cloud coupled with HPE GreenLake pay-per-use consumption models makes this a much easier part of the complete journey.

The delivery phases for your AI solution move from workshop to PoV and design to implementation, and operations. When you engage with HPE AI experts, you can discover ways to apply AI to your specific needs in weeks as opposed to months—so you can more quickly identify how to maximize the value of your data. This insight translates into tangible benefits that include limiting downtime, reduced costs through automation, and improved service quality.

Discover more about how data transformation services can help you meet transformation goals—from edge to cloud.

Meet our Compute Expert bloggers

Maria Ridruejo-HPE.jpgMaría Ridruejo, Solution Architect, WW AI & Data Practice HPE Pointnext Services. María is a solution architect in the AI & Data Practice at HPE. She works with customers to identify potential use cases on AI, analytics, and big data solutions, highlight the business benefits that could accrue from their implementation and deployment, and then translate needs and requirements into viable solution architectures.

Vinod Ridruejo-HPE.jpgVinod Bijlani, AI & IoT Practice Leader, HPE Pointnext Services.Vinod leads the AI and IoT practice for APAC region at HPE. He is primarily a technologist who is passionate about creating AI solutions that can move humanity and the environment forward. He is a distinguished inventor with 25 patents in AI and ML technologies.

Insights Experts
Hewlett Packard Enterprise

1 10+ Deploys Per Day: Dev and Ops Cooperation at Flickr

2 PathForward is a project under the Exascale Compute Project (ECP) run by the U.S. Department of Energy (DOE) with the goal to accelerate technology development for upcoming exascale-class HPC systems. 

About the Author


Our team of HPE and other technology experts shares insights about relevant topics related to artificial intelligence, data analytics, IoT, and telco.

Starting June 22
HPE Discover 2021
THE FUTURE IS EDGE TO CLOUD Prepare for the next wave of digital transformation. Join our global virtual event. June 22 – 24
Read more
HPE Webinars
Find out about the latest live broadcasts and on-demand webinars
Read more
View all