- Integrated Systems
- About Us
- Integrated Systems
- About Us
Accessible supercomputing: A story of two challenges
Dr. Tom shares how HPE and Fujitsu R&D teams are working together to make supercomputing accessible to all by taking the power of supercomputing and packaging it to be easily deployable and universally consumable. The HPE Apollo 80 server and the Fujitsu A64FX Arm-based processor play key roles in this story.
Throughout my career, I’ve observed and participated in some fascinating mutually exclusive objectives. These include: Developing a mobile computer that’s light enough to carry (but the batteries must last all day). Achieving sales quota (without engaging the sales team). And losing weight (yet constantly eating and avoiding exercise).
We technologists excel in solving mutually exclusive objectives. For years, HPE and Cray have overcome barrier after barrier, currently building three innovative exascale systems. Likely these innovations will make their way into products for the broader market. What follows is a story I observed of an R&D team that broke free of limitations to bring supercomputing to organizations and individuals who have not typically had such access to supercomputing technology.
The first challenge
Senior leaders at Fujitsu challenged its R&D team to design a pre-exascale system to be the fastest supercomputer on earth. Cray and Fujitsu share a long history of innovation building the first vector supercomputers, with Fujitsu following a few years after the legendary Cray-1. To succeed, a systems architecture that has never been done before had to be invented. The R&D team conferred, collaborated, and conceived a potent combination—Scalable Vector Extensions (SVE), coupled with on-package High Bandwidth Memory (HBM2). This was complemented with a CPU architecture optimized for Single Instruction, Multiple Data (SIMD), producing 1TB/seconding feeding the CPU and delivering 3.1 TFLOPS.
The results are astounding. The R&D team fashioned the A64FX with direct attached HBM and built the Supercomputer Fugaku, the fastest supercomputer on earth, using over 150,000 of these purpose built HPC CPUs. It sports these credentials:
- #1 Top500 Peak Flops/Linpack, November 2020
- #1 HPCG at ISC, June 2020
- #1 HPL-AI at ISC, June 2020
- #1 GRAPH500 BFS, November 2020
And it's earned accolades such as this.
“This system has a nearly magical combination of programmability, performance, and efficiency, with the potential to transform computational research in many areas of science, engineering, and industry.” - Robert Harrison, PhD, Professor of Applied Mathematics and Statistics and Director of the IACS
The second challenge
Making supercomputing accessible was the second challenge the R&D team faced. By that, I mean, taking the power of supercomputing and packaging it to be easily deployable and universally consumable. This challenge was effectively a mutually exclusive engineering objective. Challenge accepted. The number one benchmarks collected throughout 2020 made it evident that the R&D team had already accomplished the “supercomputing” part of the challenge.
The ”accessibility” part meant moving beyond the usual set of constraints—such as avoiding large supercomputer infrastructures, no addition of expanded memory, and no accelerator units. All are cost, space, and energy prohibitive.
The key to the A64FX’s high applications performance is its use of High Bandwidth Memory (HBM2) in place of traditional SDRAM DIMMs. High-performance CPUs consume data at incredibly high rates and traditional memory architectures can’t match these rates. Recognizing this, the HPE Apollo 80 with A64FX is addresses this data starvation challenge by delivering data to the CPU via HBM2 directly attached to the A64FX CPU. Instead of purchasing large amounts of memory (DIMMS) to increase the delivery of data to the CPU (memory bandwidth), buyers invest in balanced compute capacity and high bandwidth. With the HPE Apollo 80, applications utilize the smaller, faster memory, similar GPU acceleration—without demanding programming complexities. Extant applications may be easily ported and their performance tuned. This makes the outstanding float-point performance of A64FX readily accessible to many applications.
Performance benefits shared across the manufacturing industry
Many products interact with fluids such as air, water, and lubricants, particularly those that move or are acted upon by such fluids. In the manufacturing product design processes, accurate computer simulations obviate some need for real world experimentation on protypes or product. To ensure this accuracy, computational fluid dynamics (CFD) simulations require solving complex mathematics (wuhNavier Stokes Equation). This is achieved by brute force computational iterations, which are very time consuming. In such memory bandwidth intensive applications, the Apollo 80 with the A64FX, provides speedups that are two to three times faster than today’s fastest two CPU x86 systems.
During the IEEE 2020 EAHPC Workshop, my colleague Adrian Jackson of the University of Edinburgh showed benchmarking results indicating the A64FX delivers leading performance in memory bandwidth dominated codes in general, and CFD codes in specific. (Watch the video on investigating applications on the A64FX.) Similar results are being reported elsewhere, including Isambard2 at University of Bristol and Ookami at Stonybrook University.
I’ve toured factory floors and worked with manufactures around the world in Japan, India, China, Europe, and across the United States. The producers of automobiles, aircraft, power tools, turbines, telephony, and donuts, all have a common business outcome goal: to get more product out the door, faster—without sacrificing quality and safety.
Reducing design time reduces the end-to-end manufacturing process time, which in turn reduces time to revenue. When new computing capabilities are combined with critical workloads, it opens the door to the next achievements in design productivity.
I always tell my teams, “a good team follows the trends, a great team sets the trends.” HPE and Fujitsu are a great team, setting the trend for making supercomputing accessible. This enables our customers to set trends in their own marketplaces with their own customers.
What will you accomplish with your supercomputer?
Dr. Tom Bradicich
Hewlett Packard Fellow, Distinguished Technologist, Global Head of Edge and IoT Labs & CoE
Hewlett Packard Enterprise
Dr. Tom Bradicich serves as a Hewlett Packard Fellow, Distinguished Technologist, and Global Head of Edge and IoT Labs & CoE at HPE. His team develops and commercializes advanced as-a-Service (SaaS/ IaaS) software, focusing on cloud managed remote infrastructure, edge-as-a-service, and converged IT/Operational Technologies (OT). He founded and directs the HPE Channel-to-Edge Institute partner program, and leads company-wide strategies and venture/M&A assessments.