Networking
1767188 Members
5535 Online
108959 Solutions
New Article
NetworkExperts

Modern telemetry and the dawn of AIOps-based data center automation: Part 2

HPE20160526128_800_0_72_RGB.jpgThis is the second blog of a two-part series. Read the first blog here. The blog is authored by Jim Capobianco, Principal Product Manager at HPE Aruba Networking, with Scott Stevens, Field CTO for AMD Pensando, contributing.

The industry is on the cusp of achieving the long-promised goals of data center automation. Artificial intelligence operations (AIOps) engines are coming to market and are rapidly growing in capabilities—and AI/ML engines are the brains required for automation. Part one of this blog series discussed high resolution telemetry as the fuel of the AI engines, the limits of traditional telemetry, and the need for a distributed services data center architecture to enforce the mechanisms required by closed loop automation. Part two of the series discusses the new paradigm for telemetry, closed loop automation, and state of operational management systems.

The new model for telemetry

Automation in network systems necessitates the integration of AI analytics. AI analytics engines in particular require high-resolution input telemetry data to ensure accurate and reliable results. In the context of AI-based systems, incorrect results are often referred to as "hallucinations." Similar to the critical role of accurate data in autonomous driving vehicles, hallucinations in AI-driven network automation systems can lead to severe consequences, including network crashes.

To generate telemetry with sufficient resolution to avoid these hallucinations, telemetry-generating mechanisms must exhibit high performance and be directly instrumented in the data plane, rather than the control plane, of telemetry sourcing devices. In enterprise data centers, network switches are in the optimum location to generate this telemetry in a structurally integrated manner, as opposed to a bolt-on approach.

HPE Aruba Networking CX switches are specifically engineered to gather high-resolution, non-sampled network and security telemetry directly within the data plane. This capability is essential for achieving fourth-generation automation, as illustrated in Figures 1 and 2.

Historically, previous generations of network processor units (NPUs) lacked the performance capabilities necessary to capture the near-real-time data center network and security telemetry required by AI analytics engines. The telemetry generation advancements in NPUs and DPUs, as exemplified by the HPE Aruba Networking data center switches, mark a significant evolution in the ability to support sophisticated AI analytics for enhanced data center network automation.

Figure 1: Closed loop automationFigure 1: Closed loop automation

In addition to the facilitation of automation, fourth generation data center architectures optimize operations by implementing distributed network services within the network architecture. This minimizes traffic “tromboning” (east-west traffic) by locating network services (i.e. security, policy, micro-segmentation, NAT, encryption) within a computerack.

Figure 2: Automation model modernizationFigure 2: Automation model modernizationThe HPE Aruba Networking CX 10000 top of rack switches are the only switches in the industry that are designed from the ground up to perform high resolution telemetry and integrate distributed network services directly in the fabric of the data center network, implementing and enforcing the polices of the AIOps systems.  

Finding the needle in the haystack: AI/ML

With high resolution telemetry being instrumented from the right location in the data center, we now can enable AI analytics engines to do what they do best, locate the needle in the haystack, pull out anomalies (network, application, security) from “haystacks” of data, and provide closed loop automation/remediation. The mounds of hay (data) are used for “time-machine” function; the ability to look back in time for network and security operational comparisons, modeling and diagnostics.

The rapidly changing world of AIOps and data center operational management systems

All HPE Aruba Networking DCN switches produce rich, high resolution, near real time telemetry, metered directly from the data plane NPU, with high performance. The CX10000 line of products goes a step further by embedding dedicated AMD Pensando 800Gb/s data processing units (DPUs).[1] These DPUs further enhance telemetry capabilities and implement distributed network and security services in the network fabric.  

HPE Aruba Networking and AMD Pensando are working closely with leaders in this space to deliver on the promise of DCN automation and simplification to our customers. Tools in the AIOps space are changing rapidly and will do so for many years to come. Typically, these AI-based tools are cloud delivered and can be updated rapidly.

Unlike rapidly changing AIOps tools, network switches typically stay in place for 5 to 7 years—making it critical to have switches that can deliver the telemetry needed for generational enhancements in AIOps that are expected to continue for many years to come.

Let’s explore one example of how the HPE Aruba Networking CX 10000 series switch’s telemetry capabilities integrate into an emerging security and AIOps solution.

Akamai Guardicore utilizes an AI engine to generate security segmentation policies—using the HPE Aruba Networking CX 10000 for closed loop automation of data center segmentation policies.[2]

Figure 3: AI/ML and operational management toolsFigure 3: AI/ML and operational management tools

While most network operators have a good handle on north-south security policies (user-to-app), many operations teams are not as sure about what the app-to-app (east-west) security policies should be, let alone how best to enforce them. As such, it’s critical for operations to be able to select the appropriate form factor to enforce security and have clarity on the policy rules. The HPE Aruba Networking CX 10000/Akamai Guardicore integration solves both these challenges via:

  • Resolution of “form factor.”  Adding traditional firewalls for east-west traffic is an inefficient and expensive solution. Another option is to deploy agents on every workload, but that is difficult to manage and can add complexity and cost. The efficient solution puts a distributed services switch at the top of rack. The HPE Aruba Networking CX10000 switch is the only switch with this level of telemetry performance for DSS operations, and it is in the optimum place in the network (at the network-server edge).
  • Clear policy rules. Guardicore is an agent-based, east-west security policy vendor, facilitating app-to-app visibility and security policies enforcement. When working with the CX 10000, Guardicore can operate in an agentless mode. With this agentless operational mode, the CX 10000 sends its telemetry to the Guardicore cloud; the Guardicore central platform analyzes app flows, makes security rule recommendations, and sends policies to the CX10000. The CX 10000 is the source of the telemetry and the one enforcement point for the entire rack, replacing hundreds of agents. This integrated solution provides a perfect east-west security posture. The CX 10000 is the only Guardicore agent-less implementation for the enterprise.

Summary

Fundamental to the rapidly approaching possibility of achieving data center networking automation is the data that feeds the AIOps engines and operational management systems. This new level of telemetry must be near-real time, very high resolution, flow based, and security rich.

This defines the requirements of the new paradigm for telemetry. HPE Aruba Networking’s fourth generation distributed services switches are unique in the industry for providing this level of telemetry functionality.

Sources:

[1]  CX10000 Datasheet

[2] HPE Aruba Networking CX10000 Akamai Guardicore security integration

0 Kudos
About the Author

NetworkExperts