Networking
1823369 Members
2679 Online
109654 Solutions
New Article
John_Gray

Data center switching for Gen AI/ML workloads

Data center switching​ for Gen AI/ML workloads

GettyImages-900257052_800_0_72_RGB.jpgArtificial intelligence is powering businesses across all industries and verticals and becoming a general‑purpose need akin to a utility. Al‑powered solutions are essential for analyzing continuously arriving data, using task‑specific Al models such as machine learning and natural language processing (NLP) to gain insights and power real‑time decision making.

Infrastructure requirements for GenAI

Large-scale GenAI models require vast amounts of GPU‑enabled compute to support terabyte-sized training data sets with billions of parameters, accessed from ultra‑low latency memory and storage systems. Networks that support these massive GenAI models are customized for optimized, power efficient, predictable performance across LLM multi‑tenant workload environments.

Every aspect of the network, including compute, data processing units (DPUs), I/O, cabling and optics, acceleration software and the network itself—for example InfiniBand 400/800G Ethernet switches—fabric and topologies are highly tuned to support the overall system.

Prepping your data center network for AI

AI architectures require a dedicated network fabric that delivers a combination of high performance and low latency connectivity to ensure the fastest training, inferencing, and tuning model job completion times. With early HPC and AI training networks, high speed, low latency, proprietary InfiniBand networks initially gained popularity for their fast and efficient communication between servers and storage systems.

Today, 100/200/400G+ leaf/spine Ethernet switching provides an open, standards-based alternative which is gaining significant momentum for supporting the networking of HPC / AI clusters and is expected to become a popular, lowest costly alternative for many AI use cases.

“With bandwidth in AI growing, the portion of Ethernet switching attached to AI/ML and accelerated computing will migrate from a niche today to a significant portion of the market by 2027. We are about to see record shipments in 800Gbps based switches and optics as soon as products can reach scale in production to address AI/ML.” — Alan Weckel, founder and technology analyst at 650 Group

Modern AI applications need high‑bandwidth, lossless, low‑latency, scalable, multi‑tenant networks that interconnect hundreds or thousands of GPUs at high speed from 100G to 400G beyond. Ethernet‑based networking fabrics provide the reliability and performance that AI workload clusters with hundreds to thousands of GPUs require.

HPE Aruba CX 9300-32D.pngHPE Aruba Networking AIready data center switching

HPE Aruba Networking can help you design and build a dedicated AI network fabric to get you started.  Our HPE Aruba Networking CX 9300 Switch Series is a next‑generation 25.6Tbps, 1U fixed configuration switch that supports 32 ports of 100/200/400GbE.

The CX 9300 provides AI/HPC optimization features including low latency, lossless network quality of service (QoS) and connectivity characteristics that AI/HPC requires including ROCEv2, ECN, and PFC.

Learn more

HPE Aruba intelligent data center switching solutions

Introductory guide to data center switching​ for Gen AI/ML workloads

Join us at HPE Discover / Atmosphere 2024

There will be dozens of HPE Aruba Networking data center networking sessions, demos, and hands-on labs at the event.

Be sure to register soon.

HPE Discover tn.png 

0 Kudos
About the Author

John_Gray

John Gray leads Data Center Marketing at HPE Aruba Networking. He is responsible for helping customers accelerate their digital transformation by simplifying and automating legacy operating models with emerging cloud-native technologies and solutions. John is a subject matter expert in both traditional IT and emerging cloud and software-defined data center deployments including IaaS, virtualization, containers, security, software-defined storage, HCI, DevOps, automation tooling, and IP/Ethernet-based networking fabrics.