Data center switching for Gen AI/ML workloads

John_Gray · ‎05-14-2024

Data center switching for Gen AI/ML workloads

Artificial intelligence is powering businesses across all industries and verticals and becoming a general‑purpose need akin to a utility. Al‑powered solutions are essential for analyzing continuously arriving data, using task‑specific Al models such as machine learning and natural language processing (NLP) to gain insights and power real‑time decision making.

Infrastructure requirements for GenAI

Large-scale GenAI models require vast amounts of GPU‑enabled compute to support terabyte-sized training data sets with billions of parameters, accessed from ultra‑low latency memory and storage systems. Networks that support these massive GenAI models are customized for optimized, power efficient, predictable performance across LLM multi‑tenant workload environments.

Every aspect of the network, including compute, data processing units (DPUs), I/O, cabling and optics, acceleration software and the network itself—for example InfiniBand 400/800G Ethernet switches—fabric and topologies are highly tuned to support the overall system.

Prepping your data center network for AI

AI architectures require a dedicated network fabric that delivers a combination of high performance and low latency connectivity to ensure the fastest training, inferencing, and tuning model job completion times. With early HPC and AI training networks, high speed, low latency, proprietary InfiniBand networks initially gained popularity for their fast and efficient communication between servers and storage systems.

Today, 100/200/400G+ leaf/spine Ethernet switching provides an open, standards-based alternative which is gaining significant momentum for supporting the networking of HPC / AI clusters and is expected to become a popular, lowest costly alternative for many AI use cases.

“With bandwidth in AI growing, the portion of Ethernet switching attached to AI/ML and accelerated computing will migrate from a niche today to a significant portion of the market by 2027. We are about to see record shipments in 800Gbps based switches and optics as soon as products can reach scale in production to address AI/ML.” — Alan Weckel, founder and technology analyst at 650 Group

Modern AI applications need high‑bandwidth, lossless, low‑latency, scalable, multi‑tenant networks that interconnect hundreds or thousands of GPUs at high speed from 100G to 400G beyond. Ethernet‑based networking fabrics provide the reliability and performance that AI workload clusters with hundreds to thousands of GPUs require.

HPE Aruba Networking AI‑ready data center switching

HPE Aruba Networking can help you design and build a dedicated AI network fabric to get you started. Our HPE Aruba Networking CX 9300 Switch Series is a next‑generation 25.6Tbps, 1U fixed configuration switch that supports 32 ports of 100/200/400GbE.

The CX 9300 provides AI/HPC optimization features including low latency, lossless network quality of service (QoS) and connectivity characteristics that AI/HPC requires including ROCEv2, ECN, and PFC.

Learn more

HPE Aruba intelligent data center switching solutions

Introductory guide to data center switching for Gen AI/ML workloads

Join us at HPE Discover / Atmosphere 2024

There will be dozens of HPE Aruba Networking data center networking sessions, demos, and hands-on labs at the event.

Be sure to register soon.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Data center switching for Gen AI/ML workloads

John_Gray

Author

Kudos