Around the Storage Block
Ronak_Chokshi

An elevated customer support experience, powered by AI

HPE wanted to challenge the status quo and offer our customers a higher level of product experience. – An experience that encompasses predicting issues with your infrastructure before you discover them, preventing them from occurring, and in the event that incident does occur, making troubleshooting and resolution faster than ever before.

HPE-InSight__customer-support_AI_ML_ blog_580637695.pngAs the IT stack becomes increasingly complex, so do infrastructure problems. For instance, when an application critical to the business unexpectedly slows down, IT admins are left sifting through massive amounts of log data, or calling in to their IT vendor’s support lines only to be transferred over and over again, to increasing support tiers. Complex problems take significant time and resources away from higher priority projects, and innovation. Enterprise IT teams have long considered this modus operandi.

HPE wanted to challenge the status quo and offer our customers a higher level of product experience. – An experience that encompasses predicting issues with your infrastructure before you discover them, preventing them from occurring, and in the event that incident does occur, making troubleshooting and resolution faster than ever before.

This vision could only be realized with a new IT support experience and a new product design, powered by AI. Executing on this vision required a new approach that leverages a combination of technologies in a way never done before – and applying it to an age-old process.

An AI-driven approach to solving infrastructure problems

Our unique approach starts with collecting telemetry data from our global installed base. Using this data, we accelerate the trouble-shooting process for complex issues the moment we come across them, and we are able to do so without pulling logs. Once we identify the root-cause of the problem, we activate an automation process that prevents the same issue from re-occurring for every other customer.

Now, let’s unpack this process in more detail.

Our global installed base of storage, servers and integrated products can be thought of as connected infrastructure that constantly sends wellness data to our cloud. This data is used for multiple purposes. First, we use this data to monitor the behavioral patterns of your infrastructure, watching out for potential pitfalls that we have already identified. We then resolve the customer problem at hand and abstract the exact situation representative of that problem. This kicks off the automation process where we start monitoring all of our connected infrastructure systems. If we see another customer following the same trajectory, we immediately recommend the identified solution, thereby preventing that issue from occurring again.

We call this process optimization “See Once, Prevent for All—Always”. This mantra is our guiding principle, underpinning the HPE InfoSight platform, where you can see how we combine data, the power of AI and the convenience of cloud to revolutionize an otherwise cumbersome process.

This approach has resulted in industry leading uptime – with over 99.9999% of measured availability1, lower operational expenditure for our customers, predicting and preventing 86% of issues – and has helped us lead the IT industry with autonomous support operations.

That’s how and why we can say that HPE InfoSight is the industry’s most advanced AI for infrastructure.

How “See Once, Prevent for All—Always” works

Our guiding mantra has driven the following innovations in HPE InfoSight:

Expansive telemetry data lake. We collect a comprehensive set of data from our products in the field, up to and including telemetry from virtualization software running on our products. We intentionally collect much more data than we initially have specific plans for, and yes, we have designed and instrumented our products from day one in a way that sends all of this data to our cloud. This is unlike our competitors who need to run separate tools to collect performance logs after an issue has already occurred.

With the data collected from all our global installed base, we train machine learning (ML) models to learn the representative behavior of how our products operate in the field. This has been instrumental in helping us automatically solve complex quantitative problems like identifying performance bottlenecks across our installed base. AIOps products that keep data on-premises can only train models based on historical context for those particular customers, and can never solve problems with the same degree of accuracy as our approach can; they cannot ensure broad discovery and remedy for a global installed base problem.

Predictive support automation. The general philosophy of HPE Support is that if an issue is root-caused manually for one customer, automation should be written so that the issue doesn’t have to be identified again for another customer. We write signatures to automate this detection. Our signatures are based on advanced pattern recognition and are designed to capture the issue profile accurately—be it in the host, operating system, virtualization software, configuration data or storage. Using this process, we identify root-cause and provide recommendations automatically for many of the performance issues that occur in the field. This allows us to proactively discover the signals to new problems as they emerge. This proactive analysis allows what was reactive for one customer to become predictive for everyone else. Our support and the data science teams constantly assess how well this automation helps us catch and remediate problems, and work closely behind-the-scenes to continuously improve this automation. This differentiates us from most other stand-alone AIOps services that don’t provide Technical Support alongside their automation.

See-once_HPE-Insight_Figure-1.png

The most advanced AI for infrastructure, constantly learning. Machine learning and AI is an ever-evolving area. We can always make our predictions and recommendations better and more accurate for our customers. But this almost requires an industrialized process and real customer data. The problem that many machine learning projects face is a lack of labeled “ground truth” data. We address this problem in a couple of ways.

Some of our supervised machine learning models are designed to learn the normal behavior found in customer environments, in which case we have as much labeled data as we have data. In those cases, our models are providing a baseline for searching out anomalous behavior, or for common correlations.

In other scenarios, we need to be more targeted and build models that identify specific known issues. In such cases, technical support identifies examples of the issue, but this is still inadequate for us to develop accurate ML models. To combat this, we turn to our installed base to conduct some semi-supervised training rounds. We take the examples provided by support, train a ML model to identify them, and then use that model to scan our installed base telemetry for signs of similar scenarios. In this way, we find scenarios that “look” similar to the model and can bring them back to the Technical Support SMEs to provide the correct labeling. We then retrain the model and go fishing in the installed base again. This iteration repeats until we are confident the model has captured a sufficiently generalized representation of the issue. This process is many orders of magnitude more efficient than having our SMEs comb through the installed base for issues manually.

This process helps us ensure that once such a complex issue is uncovered and resolved, no other customer will experience it.

How our customers benefit from our mantra

The innovations described in this blog have resulted in a support experience that our customers never come across with any of our peers in the industry. Predictive analytics and support case automation has helped us eliminate the need for traditional Level 1 and 2 support tiers. As a result, our customers are filing 73% fewer trouble tickets related to storage systems.

Our ability to accurately predict and prevent issues from occurring is game-changing. This significantly reduces finger-pointing and downtime for our customers. In fact, with HPE InfoSight they are spending 85%1 less time managing infrastructure problems, allowing IT admins to finally spend time on innovation.

Finally, this world-class support experience, powered by AI, translates into a better product experience. A product that gets smarter with time is hardly a norm for the enterprise IT industry. Most products age over time and performance degrades, hindering IT’s ability to trust their mission-critical apps on top of such products. But HPE InfoSight has flipped that convention. With industry leading measured availability, 79% lower operational expenses2 and an ever-improving AI-powered infrastructure, our enterprise customers get the peace of mind they never thought possible – from HPE InfoSight.

To learn more about HPE InfoSight, please check out:

Ronak Chokshi
Hewlett Packard Enterprise

twitter.com/HPE_Storage
linkedin.com/showcase/hpestorage/
hpe.com/storage

1 Based on customer data collected for HPE Nimble Storage

2 Based on 3rd party research of HPE Nimble Storage

 

About the Author

Ronak_Chokshi

Ronak leads product marketing for HPE InfoSight, the industry’s most advanced AI for infrastructure. He likes to blog about data architectures for analytics, machine learning, and AI demonstrating desirable customer outcomes.