Servers & Systems: The Right Compute

Defying the law of diminishing returns: HPE Superdome Flex delivers high performance at scale

Learn about HPE Superdome Flex, a performance superstar with a unique modular architecture. It is ideal for critical applications, conventional and in-memory databases and in-memory high performance computing workloads.

Superdome_Blog.jpgThe law of diminishing returns is a concept in economics that refers to the point at which the level of profits or benefits gained is less than the amount of money or energy invested.

If we think about compute, the “benefits” can be equated to processing speed, and the “amount of energy” can be equated with processing capacity. In theory, if you add twice the processor capacity to a system, you should expect to double your processing speed. But this is often not true in practice and performance degradation as more processors are added to a system is a common fact. Enter HPE Superdome Flex, designed to scale up as a single system from four sockets to a whopping 32 sockets of compute, and defying that law by maintaining high performance, even at the largest configurations. In this post, I will cover the reasons behind the high performance of HPE Superdome Flex and provide examples where the platform leads in a variety of published performance benchmarks.

Unique architecture enables high performance

For a deep dive into the unique modular architecture of Superdome Flex, I recommend you go back to the first post in this series: The unique modular architecture of HPE Superdome Flex: How it works and why it matters. And if you are interested in the stability, reliability and stability of your environment, you can read this second post: The unique set of RAS features in HPE Superdome Flex: How they work and why they matter.

Now, as it relates to performance, here are the main architectural reasons why Superdome Flex can deliver the highest levels at even the largest configurations:

HPE Superdome Flex ASIC technology

The Superdome Flex extreme scale is achieved via the unique HPE Superdome Flex ASIC chipset, connectIntel Inside_jpg.pnging the individual four-socket chassis (as shown in Figure 1) to one another in a point-to-point fashion (as shown in Figure 2). The HPE Superdome Flex ASIC technology load-balances the fabric and optimizes latency and bandwidth, increasing performance and system availability. The ASIC connects the chassis together in a cache-coherent fabric and maintains coherency by tracking cache line state and ownership across all the processor sockets inside a directory cache built into the ASIC itself. This coherency scheme is a critical factor in the ability of HPE Superdome Flex to perform at near linear scaling from four-sockets all the way up to 32 sockets. Typical glueless architecture designs already see limited performance when scaling to as low as four- to eight-sockets because of broadcast snooping.Figure 1 Superdome.jpg

 Figure 1.Figure 2 Superdome.jpg

 Figure 2.

Ultra-low latency and single “hop” data movement

Low latency is a key factor driving the high performance of Superdome Flex. Although data exists in local memory (directly connected to processor) or remote memory (across chassis), copies of the data can exist in various processor caches throughout the system. Cache coherency keeps the cached copies consistent in the event an operation changes the data. The round trip latency between a processor and local memory is about 100ns. Latency of a processor accessing data from memory connected to another processor over UPI is ~130ns.

Processors accessing data residing in memory in another chassis will travel between two Flex ASICs (always a single “hop”) for a round trip latency of under 400ns—no matter if a processor at the top of the rack is accessing data from memory at the bottom.

High bandwidth across subsystems

Achieving breakthrough system performance means maintaining balance between processing power, memory capacity/performance, interconnectivity, and system I/O capabilities. The HPE Superdome Flex delivers outstanding bandwidth in the various subsystems:

  • Processor—The Intel® Xeon® Scalable processors offer new UPI links, running at 10.4 GT/s; these provide increased bandwidth and performance over prior QPI technology. In addition, these processors deliver 50% more PCIe 3.0 bandwidth available through 48 bifurcated lanes.
  • Memory—The Superdome Flex memory channels are fully independent and as such, all can run simultaneously at DRAM data transfer rates up to 2667 MT/s to provide each four-socket chassis with >360 GB/s of local memory bandwidth. Because of the system´s clever design, you can count on that same linear memory bandwidth scaling all the way to 32 sockets where such incredible memory capacity and performance levels will be needed to keep as many as 896 cores of Intel Xeon Scalable processing power running.
  • Interconnectivity—HPE Superdome Flex ASIC provides 16 Superdome Flex fabric interconnects via chassis front-end ports, each capable of 13.3 GB/s data rates for maximum fabric bandwidth: more than 210 GB/s of bi-sectioned crossbar bandwidth at eight-sockets, more than 425 GB/s at 16 sockets, and over 850 GB/s at 32 sockets.
  • I/O—Each HPE Superdome Flex chassis can be equipped with either a 16-slot or 12-slot I/O bulkhead to provide innumerable stand-up PCIe 3.0 card options and flexibility. The 16-slot I/O bulkhead provides nine low profile x8 and seven low profile x16 PCIe 3.0 card slots. The I/O bulkhead utilizes all the available 48 PCIe lanes per processor to the maximum degree possible with as much as 110 GB/s per chassis of I/O bandwidth available. The 12-slot I/O bulkhead provides four full-height x8, four full-height x16, three low-profile x8, and one low-profile x16 PCIe 3.0 card slots. With either I/O bulkhead selection, the I/O design provides direct connections between the processors and the card slots without need for bus repeaters or re-timers that could add latency or reduce bandwidth. That’s why as an HPE Superdome Flex customer, you can rest assured you will get the best-per-card performance possible.

Leading results in published performance benchmarks

Because of its clever design and unique architecture, HPE Superdome Flex has been able to deliver record results in a variety of performance benchmarks:

  • SAP workloads—HPE Superdome Flex Server delivered leadership eight-processor scale-up performance for SAP® OLTP and SAP HANA® OLAP Workloads.
    • OLTP—The two-tier SAP® Sales and Distribution (SD) standard application benchmark simulates a traditional transactional SAP ERP workload; in this benchmark, HPE Superdome Flex delivered eight-processor (8P) leadership with the Red Hat Enterprise Linux operating system and SAP ASE database
    • OLAP—The SAP® Business Warehouse (BW) edition for SAP HANA® standard application benchmark simulates a modern analytics workload. In this benchmark, Superdome Flex delivered:
  • World-record single-DB-node results with 10.4 billion initial records and with 11.7 billion initial records, on an eight-processor system, across all three phases of the benchmark, with SUSE Linux Enterprise operating system and SAP HANA in-memory database
  • In addition, the result with 10.4 billion initial records is the first result on the Version 2 benchmark with the Meltdown and Spectre security patches, variants #1 and #2.

These results are proof that Superdome Flex can excel in both transactional and modern analytics workloads. If you would like more details on these results, take a look at this technical whitepaper.

  • Shared-memory and parallel processing performance—The SPEC OMP2012 benchmark is designed for measuring performance using applications based on the OpenMP 3.1 standard for shared-memory parallel processing. In this benchmark, the HPE Superdome Flex achieved #1 overall results and #1 scores for 32-socket (32S), 16S, and 8S servers. These top results are on the SPECompG_base2012 and SPECompG_peak2012 metrics. The Superdome Flex servers were configured with Intel Xeon Gold 6154 processors. For more details on these results, see the performance brief
  • Compute-intensive workload performance—The SPEC CPU2017 benchmark contains SPEC's next-generation, industry-standardized, CPU-intensive suites for measuring and comparing compute-intensive performance, stressing a system's processor, memory subsystem, and compiler. HPE Superdome Flex attained ten leadership results on the SPEC CPU2017 benchmark metrics. The Superdome Flex holds the #1 overall results and also the top two 32S and 16S results on both the SPECrate2017_int_base and SPECrate2017_fp_base metrics. To get more information, check out this performance brief.

Benchmarks confirm powerful reasons to choose Superdome Flex

The unique HPE Superdome Flex  architecture delivers the performance you need for the most demanding workloads. Plus, its differentiated RAS features give you the availability and stability you need for your critical applications. This compute powerhouse is ideal for critical applications, conventional and in-memory databases and in-memory high performance computing workloads. Stay tuned for future posts on use cases and customer examples.

Benchmark result fair use information

All results as of June 13, 2018, unless otherwise noted. SAP benchmarks: See for further details. SPEC benchmarks: See

Two-tier SAP SD standard application benchmark: HPE Superdome Flex Server with 8 Intel Xeon Platinum 8180 2.50 GHz processors (8 processors/224 cores/448 threads); 3 TB of memory; Red Hat Enterprise Linux 7.4, Sybase ASE 16.0, and SAP enhancement package 5 for the SAP ERP application 6.0; Certification # 2018002. Results: 99,115 SD benchmark users, 542,370 SAPS; performed December 9, 2018, in Palo Alto, CA, USA.

SAP BW Edition for SAP HANA standard application benchmark at 10.4 billion initial records: HPE Superdome Flex Server with Intel Xeon Platinum 8180 2.50 GHz processors (8 processors/224 cores/448 threads); 6 TB memory; SUSE Linux Enterprise Server 12 SP 2; SAP HANA 1.0 SPS 12 Revision 13; SAP NetWeaver® 7.50 SP04; Certification # 2018012. Results: Runtime of Data Load/Transformation 134,501 seconds, Query Executions per Hour 2,277, Runtime of Complex Query 226 seconds; performed April 8, 2018, in Palo Alto, CA, USA.

SAP BW Edition for SAP HANA standard application benchmark at 11.7 billion initial records: HPE Superdome Flex Server with Intel Xeon Platinum 8180 2.50 GHz processors (8 processors/224 cores/448 threads); 6 TB memory; SUSE Linux Enterprise Server 12 SP 2; SAP HANA 2.0; SAP NetWeaver® 7.50; Certification # 2018022. Results: Runtime of Data Load/Transformation 203,380 seconds, Query Executions per Hour 3,198; Runtime of Complex Query 209 seconds); performed June 21, 2018, in Walldorf, Germany; results as of July 2, 2018.

SPEC OMP2012 benchmark:

SPEC OMP2012 results of the HPE Superdome Flex were obtained using the 18-core Intel Xeon Gold 6154 processor. For SPECompG_base2012, 276 threads were used for the 8S result and 513 threads were used for the 16S and 32S results. For SPECompG_peak2012, various thread counts were used for the different socket configurations with a range of 144 to 288 threads for the 8S result; a range of 256 to 576 threads for the 16S result; and a range of 512 to 576 threads for the 32S results.

© Copyright 2018 Hewlett Packard Enterprise Development LP. The information contained herein is subject to change without notice. The only warranties for Hewlett Packard Enterprise products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Hewlett Packard Enterprise shall not be liable for technical or editorial errors or omissions contained herein. Intel and Xeon are trademarks of Intel Corporation in the U.S. and other countries. Red Hat is a trademark of Red Hat, Inc. SUSE is a registered trademark of SUSE LLC in the United States and other countries. Linux is a registered trademark of Linux Torvalds. SPEC and the names SPEC CPU, SPECint, SPECfp, and SPEC OMP are registered trademarks of the Standard Performance Evaluation Corporation (SPEC). SPEC and the names SPEC CPU, SPECint, SPECfp, and SPEC OMP are registered trademarks of the Standard Performance Evaluation Corporation (SPEC). The stated results are published as of 01-29-18; see

All rights reserved. SAP, SAP HANA, SAP S/4HANA, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. See for additional trademark information and notices. All other product and service names mentioned herein are the trademarks of their respective owners.

Meet Servers: The Right Compute Blogger Diana Cortes, Marketing Manager, Mission Critical x86 Solutions, HPE.

Diana Cortes Headshot.jpgDiana has spent the past 20 years working with the technologies that power the world’s most demanding environments and is interested in how solutions based on those technologies impact the business. A native from Colombia, Diana holds an MBA from Georgetown University and has held a variety of regional and global roles with HPE in the US, the UK and Sweden.



0 Kudos
About the Author


Our team of Hewlett Packard Enterprise server experts helps you to dive deep into relevant infrastructure topics.