AI Unlocked
1847159 Members
5680 Online
110263 Solutions
New Article
HPE_Experts

HPE delivers several world records in latest MLPerf® Inference benchmarks

HPE achieved top results in MLPerf®1 Inference v5.1 benchmarks across various AI workloads with HPE ProLiant and HPE Cray servers, showcasing our commitment to AI innovation and benchmarking excellence.

MLCommons MLPerf® Inference v5.1 is out, and Hewlett Packard Enterprise demonstrated leadership in AI inferencing with fourteen #1 results across multiple scenarios in the Datacenter and Edge categories, from computer vision-object detection to LLM speech and text recognition.

HPE is committed to benchmarking excellence, including through its membership in MLCommons, an independent engineering consortium that offers an objective way of measuring performance across technology vendors through standardized AI benchmarks.

The latest MLCommons results1, 2 are further proof that HPE solutions deliver the performance our customers need to address the demanding requirements of AI workloads—no matter the size of the model or whether the customer is doing AI training, fine-tuning, or inferencing. We offer a robust set of AI inferencing solutions, spanning from compact and versatile for any edge environment, all the way to at-scale data centers.

fig 1.png

Figure 1. #1 HPE results in MLPerf®1 Inference v5.1 benchmarks

 

Superior performance for AI-driven recommendation and speech recognition with HPE ProLiant Compute servers

With eight #1 rankings across various categories, the HPE ProLiant Compute portfolio has once again demonstrated exceptional results, particularly with the HPE ProLiant Compute DL380a Gen12, HPE ProLiant DL385 Gen11, and HPE ProLiant Compute DL384 Gen12 servers. These results reaffirm HPE’s unwavering commitment to AI innovation and delivering groundbreaking performance for modern data workloads.

The HPE ProLiant Compute DL380a Gen12 emerged as the standout performer, achieving seven #1 rankings and reinforcing its position as a benchmark champion. Notably, the server excelled in MLPerf®1 Inference v5.1 Deep Learning Recommendation Model (DLRM) benchmarks, setting the standard for performance in AI-driven recommendation systems. Among its accomplishments, the DL380a secured four #1 spots when comparing servers with Intel® Xeon® processors and NVIDIA GPUs, as shown in this chart:

fig 2.png

  Figure 2. #1 HPE ProLiant DL380a Gen12 results3 in MLPerf®1 Inference v5.1 benchmarks

In addition, the DL380a achieved two overall #1 spots in the DLRM-v2-99 and DLRM-v2-99.9 benchmarks (Server scenario) with 65,021 and 41,357 queries/second per GPU, respectively4. This builds on the earlier success of the DL380a Gen12 in MLPerf®1 Inference v5.0 Datacenter DLRM benchmarks, where it performed 57% better than the next-best submission in the DLRM-v2-99 Offline scenario. In this new round, v5.1, it continued to dominate, outperforming competitors by 29% in the DLRM-v2-99 Server test, showcasing consistent excellence over multiple benchmark iterations.

Building on its exceptional performance in DLRM benchmarks, the DL380a Gen12 further demonstrated its versatility and leadership in large language model (LLM) workloads, which are critical for generative AI applications such as natural language processing and conversational AI. In the MLPerf®1 Inference v5.1 Llama3.1 8B test (Server scenario), the DL380a Gen12 claimed the top spot among systems with eight PCIe-based GPUs, delivering an impressive 46,060.0 tokens/second6 and outperforming Cisco's UCS C845A M8 by 19%, with its 38,696.9 tokens/second7 result. Additionally, in the MLPerf®1 Inference v5.0 Llama2 70B benchmarks (Offline scenario), the DL380a Gen12 secured #1 rankings in both the 99 and 99.9 accuracy tests.8 Both the DL380a Gen12 and Dell's PowerEdge XE7745, equipped with eight NVIDIA L40S GPUs, competed in this benchmark, with the DL380a delivering 3655.89 tokens/second9 compared to Dell’s 3481.53 tokens/second.10 These results highlight the DL380a’s ability to excel across diverse AI inference tasks, setting a high standard for performance and reliability in LLM workloads.

Making its debut in MLPerf®1 Inference benchmarks, the HPE ProLiant DL385 Gen11 immediately secured a #1 spot, delivering the best per-GPU performance for a PCIe-based system in the new speech-to-text Whisper benchmark, with 3962.78 samples/second per GPU,11 when equipped with NVIDIA H200 NVL 141GB GPUs. The DL385 Gen11 has the ability to scale GPU capacity without compromising performance, making it a cost-effective choice for emerging AI workloads in modern data centers.

Unique GPU configurations

In addition to the eight #1 results achieved by HPE ProLiant Compute servers, HPE submitted three unique GPU configurations for benchmark tests, as follows:

  • HPE was the only company to submit a server utilizing the NVIDIA RTX PRO 6000 Blackwell Server Edition GPU in this round of MLPerf®1 Inference testing. The HPE ProLiant Compute DL380a Gen12, equipped with this innovative GPU, demonstrates HPE’s commitment to leveraging cutting-edge technology to deliver exceptional AI inference performance.
  • The HPE ProLiant DL384 Gen12, equipped with  NVIDIA GH200 NVL2 accelerators, stood out in the DLRM-v2-99 benchmark, achieving 161,030 queries per second12 (Server scenario) and 174,456 samples per second (Offline scenario),12 making HPE the only vendor to submit results with this advanced configuration.
  • In the edge scenario, HPE ProLiant ML30 Gen11 displayed strong throughput performance in RetinaNet benchmark tests, delivering 258.352 samples/second and multistream latency = 29.96 ms13 on a single NVIDIA RTX 4000 Ada 20GB GPU.7 The HPE ProLiant ML30 Gen11, combined with NVIDIA RTX 4000 Ada 20 GB, provides engineers and designers the ability to work efficiently with inference applications such as object detection, image segmentation, and point painting.

For all the details on these leading results, check out this infographic.

Versatile, scalable HPE Cray XD670 achieves top results in object detection, Q&A, LLM text generation, and speech recognition.

Specially optimized for service providers and large AI model builders, HPE Cray XD670, featuring eight NVIDIA H200 SXM Tensor Core GPUs, delivered six #1 results14 in these latest MLPerf®1 Inference v5.1 tests, compared to other systems featuring NVIDIA H200 SXM GPUs, in the following areas:

  • Computer vision—object detection: #1 in RetinaNet (Offline) with 14,996.6 samples per second15, reaffirming this platform’s performance leadership in object detection, following a #1 result in this same model during the prior round of benchmark tests.16
  • Language—LLM—chat Q&A: # 1 in the new Llama3.1-8B (Server and Offline), with 64,914.6 and 66,036.8 tokens per second17, respectively. These results are 12% higher (Server) and 16% (Offline) than the nearest competitor’s entry using the same number of NVIDIA H200 SXM GPUs.
  • Language—LLM—text generation: Question answering, Math and code generation: #1 in Mixtral-8x7B (Server and Offline), delivering 60,955.1 and 62,108.4 tokens per second, respectively.18
  • Language—Speech to text: #1 in the new Whisper (Offline) with 34,450.7 samples per second.19

Language—LLM—chain of thought (CoT) reasoning: The only system featuring NVIDIA H200 GPUs to publish DeepSeek-R1 Offline performance of 8904.83 tokens per second, which was submitted in the Datacenter-Open category.20

fig 3.png

 Figure 3. #1 HPE Cray XD670 results in MLPerf®1 Inference v5.1 benchmarks

These achievements further reinforce HPE’s position as a leader in delivering high performance compute solutions tailored for AI-driven workloads. In addition to the technology, the HPE AI performance engineering team of experts running the benchmarks has been pivotal to achieving these results. Their deep understanding of system capabilities, architectural and storage nuances, and fine-tuning can also be leveraged by our customers to size AI workloads and tune applications to optimize the performance of their environments.

Empowering AI innovation with HPE

Building on the strong results we achieved in the latest MLPerf®1 Inference 5.0 round, with the HPE ProLiant Compute DL380a Gen12 being a new participant, we look forward to submitting MLPerf®1 benchmark results with yet another new entrant in future tests, HPE ProLiant Compute XD685. Designed for service providers and large enterprises building and training their own large AI models, this server supports eight NVIDIA Blackwell or NVIDIA H200 SXM GPUs, and we recently announced support for AMD Instinct MI355X GPUs. 

HPE is the essential technology partner for customers to unlock their AI ambitions, and the MLPerf®1 benchmark results continue to underscore our leading foundational technology and commitment to innovation. For more details about the MLPerf®1 Inference v5.1 benchmark results, visit the MLCommons website.

Learn more about HPE ProLiant solutions for AI | Infographic

For specific product details, go to HPE ProLiant Compute DL380a Gen12 | HPE ProLiant DL385 Gen11, HPE ProLiant Compute XD685, HPE Cray XD670, or contact your HPE representative.

Footnotes:

1 The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries.
  All rights  reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

2 MLCommons Releases New MLPerf Inference v5.1 Benchmark Results, MLCommons, September 2025

3 MLPerf® Inference: Datacenter v5.1 Closed. DLRM-v2-99 and DLRM-v2-99.9 benchmarks based on systems utilizing Intel Xeon 6787P processors and NVIDIA
  H200-NVL-141GB GPUs. Submission IDs 5.1-0050 and 5.1-0080

4 MLPerf® Inference: Datacenter v5.1 Closed. DLRM-v2-99 and DLRM-v2-99.9 benchmarks. Submission ID 5.1-0050

5 MLPerf® Inference: Datacenter v5.0 Closed. DLRM-v2-99 Offline benchmark. Submission ID 5.0-0043

6 MLPerf® Inference: Datacenter v5.0 Closed. Llama3.1-8b benchmark. Submission ID 5.0-0051

7 MLPerf® Inference: Datacenter v5.0 Closed. Llama3.1-8b benchmark. Submission ID 5.0-0011

8 MLPerf® Inference: Datacenter v5.0 Closed. Llama2 70B benchmark. Submission ID 5.0-0038

9 MLPerf® Inference: Datacenter v5.0 Closed. Llama2 70B benchmark. Submission ID 5.0-0046

10 MLPerf® Inference: Datacenter v5.0 Closed. Llama2 70B benchmark. Submission ID 5.0-0018

11 MLPerf® Inference: Datacenter v5.1. Closed. Whisper benchmark. Submission ID 5.1-0052

12 MLPerf® Inference: Datacenter v5.1. Closed. DLRM-v2-99 benchmark. Submission ID 5.1-0053

13 MLPerf® Inference: Datacenter v5.1. Closed. RetinaNet benchmark. Submission ID 5.1-0054

14 MLPerf® Inference: Datacenter v5.1. Closed. Benchmark Suite Results, MLCommons.

15 MLPerf® Inference: Datacenter v5.1. Closed. RetinaNet Benchmark (Offline.) Submission ID 5.1-0049

16 HPE Cray XD670 achieved the top Retinanet-Offline result in MLPerf Inference: Datacenter v5.0 Closed tests, compared to other systems featuring eight NVIDIA H100 SXM GPUs. April 2025. Submission IDs: 5.0-0039, 5.0-0040

17 MLPerf® Inference: Datacenter v5.1. Closed. Llama3.1-8B benchmark (Server and Offline). Submission ID 5.1-0049

18 MLPerf® Inference: Datacenter v5.1. Closed. Mixtral-8x7B benchmark (Server and Offline). Submission ID 5.1-0049

19 MLPerf® Inference: Datacenter v5.1. Closed. Whisper benchmark (Offline). Submission ID 5.1-0049

20 MLPerf® Inference: Datacenter v5.1. Open. DeepSeek-R1 benchmark (Offline). Submission ID 5.1-0375

 

By Authors:diana pic.jpg


    Connect with Diana on Linkedin: linkedin.com/in/diana-cortes-0261631/ 

sonja.jpg

0 Kudos
About the Author

HPE_Experts

Our team of Hewlett Packard Enterprise experts helps you learn more about technology topics related to key industries and workloads.