Networking
1824960 Members
3599 Online
109678 Solutions
New Article ๎ฅ‚
NetworkExperts

Get rich data center telemetry with DPU-powered switches

_HPE20160512062_800_0_72_RGB.jpg

Network telemetry is a source of truth for network engineers and security operations teams.  Telemetry takes a variety of forms, including SNMP, device memory and CPU utilization, port status, firewall syslogs, and flow records. Flow records are particularly valuable because they track the source and destination of communications, identify applications, and monitor bandwidth consumption by devices, protocols, and applications.

However, telemetry can be hard to collect, especially in the data center. The typical data center approach is to attach hardware probes to network devices, or to install software on the servers. While these probes and agents can gather flow records, they tend to be expensive and complicated to deploy and only provide visibility where deployed, which typically shows just a fraction of the overall data center traffic. To get full fidelity, youโ€™d almost have to build a second network, which is cost-prohibitive. Whatโ€™s more, devices or software agents also need to be monitored and maintained, which adds to the to-do lists of busy network engineers.

Given these constraints, many companies will rely on the sampled telemetry they can gather from the data center switches. This approach means typical solutions can only provide insights based on a small sample of total network traffic; in some cases, as little as 1 in every 8,000 flows, or .0125% of all traffic.

I believe that this limited sampling is not acceptable. It restricts visibility and doesnโ€™t provide a full picture of the data center. It also hampers the effectiveness of AIOps tools by only providing partial awareness of what is happening in the network. Using only sampled flows creates a โ€œgarbage in-garbage outโ€ scenario that drastically restricts the insights that modern AI/ML tools can provide.

The value of rich telemetry

Rich telemetry indicates the state of the network as well as the health of individual devices in it. It provides insights into performance and is essential for troubleshooting. With access to the right telemetry, network engineers can speed up mean time to resolution (MTTR)โ€”or mean time to innocence (MTTI)โ€”when the network isnโ€™t at fault.

Telemetry is also valuable for security operations. By tracking the east-west movement of traffic through a network fabric, security teams may be able to identify anomalies or patterns that indicate suspicious behavior, be it an intruder mapping out resources or an insider trying to access sensitive systems.

Lastly, telemetry is vital for network automation, including AIOps. AI and ML tools are fueled by telemetry; it is the raw data they analyze to generate context-based insights or take automated actions. Without telemetry, there would be no modern AIOps. Today, feeding non-sampled flows into AI/ML tools creates the conditions for the advanced automation that has been needed for decades in the data center.

DPUs put eyeballs in your switches

So how to get better telemetry from your data center? A new option is to marry the computing power of data processing units (DPUs) with data center switches. The DPU is an evolution of the SmartNIC; it is a programmable processor designed to offload and accelerate networking, security, and other data center infrastructure services. DPUs can be deployed in servers and switches. By adding DPUs to Top of Rack (ToR) switches, network engineers can collect and export telemetry such as flows and logs via a computer platform that sits directly in the path of your data center trafficโ€”on servers hosted in the data center.

HPE Aruba Networking and AMD have partnered to develop the industryโ€™s first DPU-enabled switch, the HPE Aruba Networking CX 10000 with AMD Pensandoโ„ข switch. The CX 10000 is a 1 RU device that offers 3.6 Tbps of standard line-rate stateless switching and supports 1, 10, and 25 GbE port options to servers with 40/100 GbE uplinks.

According to HPE, this CX 10000 Distributed Services Switch further delivers stateful services at 800 Gbps of throughput in each server rack. With its integrated programmable DPU, it can offer highly scalable east-west network firewall security, full non-sampled telemetry, IPsec encrypt/decrypt, and network address translation services. The form factor of the CX 10000 is designed to distribute these services to to the edge of the data center fabric, directly connected to each server; by doing so, service resources automatically scale along with data center workloads.  This is the same architecture leveraged by many of the world's largest hyperscalers.

The CX 10000 can export firewall logs as well as industry-standard non-sampled IPFIX flow records. Network engineers can set intervals for flow sampling based on their requirements, from as granular as every second to longer periods such as one or five minutes.

In the flow

For years, organizations have been bolting on telemetry solutions to the network. By embedding DPUs into the switch, telemetry capabilities are now woven into the network fabric itself. And because these capabilities are offloaded to DPUs, there is no impacting switch performance.

By monitoring flow records and logs, network engineers can quickly spot congestion, retransmission, packet drops, and bandwidth-hogging applications. This can speed up troubleshooting, and even allow network engineers to head off issues before they impact application performance or service levels. Uniquely, since there is now telemetry for all flows in the network, network visibility is now mapped directly to each application instead of the legacy model of examining trunk usage.

Of course, itโ€™s one thing to collect telemetryโ€”it also needs to be analyzed. This analysis is best handled by dedicated systems such as flow analyzers, log collectors, and SIEMs. HPE Aruba Networking has developed a set of APIs to provide flow records and logs to a variety of third-party tools that are widely used in network operations centers (NOCs) and security operations centers (SOCs). These integrations include solutions from Splunk, Elastic, Guardicore, and Augtera Networks.

And as more AI and ML-driven systems come to market, the DPU-powered CX 10000 switch will be ready to fuel these tools with the high-fidelity telemetry required for these systems to provide accurate, context-based insights or take automated actions.

I can see clearly now

Network engineers have lacked the ability to gather comprehensive telemetry in data center networks because of complex, cost-prohibitive collection architectures. That changes with the CX 10000, which now makes rich telemetry available for collection and analysis. HPE Aruba Networking and AMD have developed a unique approach that inserts telemetry collection directly into the data center fabric.

For more information:

 

About the Author

scott-headshot-crop.jpgScott Stevens

Field CTO, AMD Pensando

With over 25 years of experience in the networking and security industries, Scott leads the Global Systems Engineering team for AMD Pensando. Here he is responsible for driving the DPU embedded solutions of AMD Pensandoโ€”both the Aruba CX10000 and the VMware Project Monterey enabled DPUs. In addition, he leads the Technical Business Development team bringing new innovative integrations with 3rd party AI/ML vendors to compliment and automate the Aruba CX10000 solution.

Previously, he ran the Global Systems Engineering team for Palo Alto Networksโ€”building the team from 100 to over 1300 customer facing field engineers as revenue ramped to over $4B annually. Prior to that he ran the Global SE team for Juniper Networks, spending 14 years building various Systems Engineering and Sales organizations.

Scott holds a Masters Degree in Business from Oklahoma City University and a Bachelors of Science Degree in Electrical Engineering from the Massachusetts Institute of Technology (MIT). He speaks regularly at industry conferences and is viewed as an industry visionary in the area of network and security architectures.

0 Kudos
About the Author

NetworkExperts