Around the Storage Block

Human intelligence amplified by AI enables HPE InfoSight to deliver preemptive and proactive support

At Discover Madrid, we learned how the IT staff of a composite company called Earth Machines was proactively notified about a disruptive bug. For reference, the complete presentation is here; the Earth Machines section starts at 33:12. Although Earth Machines is a fictitious company, the details of the scenario are real and were actually experienced in the field. Today we’re going to take a deeper look at the HPE InfoSight support story mentioned in the presentation.

A bug surfaced in one of the libraries that NimbleOS (the HPE Nimble Storage operating system) depends on. If left unaddressed, this bug might have caused unexpected failovers and potential outages. Let’s look at a detailed timeline and explanation of how HPE InfoSight, in conjunction with our expert level 3 support engineers, worked behind the scenes to help Earth Machines avoid a major business disruptor.

Day 1-> A public report of a software bug was released, alerting customers and vendors that affected systems would crash and reboot after 208 ½ days of uptime. Earth Machines IT staff might have heard about this, but they would have had no way of knowing that their HPE Nimble Storage arrays would be affected.

Day 2-> HPE Nimble Storage support and development engineers realized that the library problem would affect a particular version of NimbleOS. Embedded HPE Nimble Storage support engineers were present in product development team meetings, enabling real-time triage and communication. Support engineers took the lead on the customer’s behalf and begin to utilize the massive data lake that HPE InfoSight had gathered from the installed base. In parallel, core NimbleOS engineers began to work on finding a way to fix the issue for customers by coding around the library bug.

HPE Nimble Storage arrays running the affected OS were excluded, and the OS version was pulled from the HPE InfoSight portal. Customers running previous versions of NimbleOS were prevented from upgrading to this particular build. Excluded arrays is one of the most powerful and beneficial tools available to HPE support because it prevents customers from being able to install software that has known issues.

When they discussed this issue with other vendors at a local user group meeting, Earth Machines IT staff were surprised to hear that some customers had never even seen the affected NimbleOS version in their list of available updates on the HPE InfoSight portal. Those customers were running a previous version, and they never encountered the problem because the affected NimbleOS build had already been pulled from HPE InfoSight.

Since 2008, all HPE Nimble Storage arrays have been automatically sending millions of data points every day to a cloud database, totaling over 350 trillion and counting. This database is used by many tools and workflows, all of which make up the ecosystem that customers know as HPE InfoSight.

Consisting of automated data collection, human interaction/triage/root cause analysis, a living support knowledge base, cutting-edge database queries, and case automation, this process has been going on for over 10 years. This means that the ecosystem, workflows, and visual presentation shown on the web portal are very mature – that’s the key HPE InfoSight market differentiator.

Two of the saved data points are system uptime and NimbleOS version. By leveraging the back-end data pool in HPE InfoSight, level 3 support was able to identify Earth Machines as a vulnerable customer. Support engineers were able to quickly sort the installed base by uptime and categorize customer risk within a single business day. They used case automation to create a P1 case for customers like Earth Machines who were about to have reboots triggered by the bug. For storage systems further out from the reboot window, they proactively opened less severe P3 cases to inform customers of the issue before they hit day 208. General alerts like these are prominently displayed in the Wellness tab on the HPE InfoSight portal.


Now let’s get back to the timeline…

Day 3-> Since Earth Machines was close to the 208 ½ day up-time mark, they were contacted by a level 3 support engineer to help resolve the issue of controller reboots.

The following screenshot shows what Earth Machines would have seen on the HPE InfoSight portal for this URGENT issue:


Here’s a close-up of the alert:


In addition to alerting customers about the impending reboots, the message recommended the appropriate permanent resolution in the proactive support cases. This specific resolution was saved in the support knowledge base and made available to assist all customers for future reference.  Each time level 3 support triages and verifies a fix for a customer, the solution is saved in the support database. After first use, solutions are reverified and updated to maintain accuracy every time they are used. This crucial step enables future queries and case automation to preemptively find and provide solutions for a variety of issues. This is one practical example of what many people refer to as AI or Machine Learning. Known internally as “see once, prevent for all,” this mature model pioneered by HPE Nimble Storage will benefit all HPE products moving forward. HPE InfoSight offers the most mature predictive analytics because we have been able to train the model with years of data from support solutions and recommendations like the ones in this story.

Day 4-> In this case, the action item was to update to the latest NimbleOS version, which contained a fix for the bug. To avoid unnecessary issues, Earth Machines arrays were excluded for any updates until the NimbleOS software version containing the fix was available.

From Earth Machines’ perspective, HPE InfoSight helped to predict an unexpected reboot, proactively opened a support case for them, and recommended a permanent fix. The combination of automated data collection, expert-level human interaction, up-to-date support knowledge base articles, mature data science queries, and case automation delivers a cutting-edge support experience to all HPE customers. With all of these benefits displayed in one place along with many other helpful tools, the HPE InfoSight portal ties together everything you need in a responsive and interactive way.


More info ->



About the Author


Evans specializes in HPE InfoSight and HPE Nimble storage technologies. He has extensive experience with the Microsoft, VMware, and Citrix product portfolios both as an Infrastructure Engineer and in Customer Support and QA Engineering.