Achieve GPU-level performance with high bandwidth memory for CPUs

AdvEXperts · ‎04-12-2022

Work smarter, faster, and more efficiently with HPC Arm-based systems from HPE? Yes! Learn how HPE and Fujitsu are pairing world-leading CPUs with HBM and SVE to deliver greater application performance.

Today’s companies are searching for ways to enhance their technology infrastructure without the cost or complexity of adopting GPUs. HPC systems built on industry-leading CPUs deliver unparalleled processing power and speed, without requiring substantial software re-engineering. The combination of CPUs and high bandwidth memory (HBM) has the potential to revolutionize compute-bound code, enabling companies to accelerate innovation and focus their technology resources on maximizing their bottom lines. CPU-based HPC systems with HBM are a powerful foundation for companies to work smarter, faster, and more efficiently, equipping them to develop next-step capabilities within existing cycles times and capture actionable insights in real time.

Utilizing HBM for next-generation compute

HPE is defining a new era of compute, offering robust HPC systems with the most advanced Arm® processors on the planet. We supply companies with the high bandwidth memory, agility, and resiliency they need without having to refactor existing codes for GPUs. HPE Apollo 80, the latest Arm-based system from HPE, is designed specifically for memory-bound codes, powered by the Fujitsu A64FX CPU to deliver extreme performance for a broad range of HPC applications.

The Fujitsu A64FX brings new technologies to CPUs for the first time, including scalable vector extensions (SVE) and direct-attached HBM to run memory bandwidth dependent workloads with unmatched agility. The SVE implementation means that code written for the HPE Apollo 80 can take advantage of future Arm-based processors with longer SVE units. This new breed of system delivers world-leading performance and is orders of magnitude more straightforward than implementing GPUs, so companies can transform quickly to take on their biggest scientific, technical, and business-related problems.

Companies operating on HPE Arm-based systems can get complex codes up and running with ease. A64FX CPUs are already matching the best x86 CPUs available on the market, delivering up to 4x the memory bandwidth of x86 systems (1TB/s versus 256 to 512GB/s), a substantial performance advantage for memory bandwidth-bound applications.

To help companies get started, these groundbreaking systems are supported by the HPE Cray Programming Environment (PE). HPE Cray PE is a complete software development suite designed to take the frustration out of the software development, shorten the development cycle, and help make applications run better. The suite was developed by engineers with combined hundreds of years of HPC experience. While HPE Cray PE is best known for its core components of compilers, MPI, debuggers, and libraries, it also offers performance analysis and optimization tools—a comprehensive collection of tools and experiments designed to fit different developers, helping profile to find bottlenecks and parallelize to drive efficiencies for critical applications.

The fully integrated software suite is designed to improve the developer experience by delivering a complete system view, offering intuitive behavior and enhanced performance for their applications with the least amount of effort. These capabilities are key enablers for companies that develop their own HPC code and port to the new SVE architecture. By simplifying porting of existing applications with minimal recoding and changes to the existing programming models, companies can easily make transition to new hardware architectures and configurations.

For systems with the A64FX, HPE Cray PE can be used for auto-vectorization and software pipelining to increase performance. To promote the use of vector instructions, SVE lets companies use a predicate register to specify whether or not to execute operations on each element of the vector instructions. This enables the vectorization of complex loops that include IF statements and makes it possible to operate at high speeds.

Software pipelining (SWP) is an important optimization for increasing the parallelism of instructions. The SWP function arranges the order of instructions in a loop in a program so that one cycle overlaps the next cycle. This arrangement considers performance factors such as the number of computing units, the latency of individual instructions, and the number of registers in order to optimize the instructions in the loop for faster execution. In order to effectively run SWP, which requires many registers and memory accesses, the compiler leverages loop fission optimization to reduce the required resources.

HPE Apollo 80 with the Fujitsu A64FX processor offers a unique opportunity to easily take advantage of the performance of HBM coupled with SVE to accelerate HPC applications, maximize ROI, and ensure success today and in the future.

Contact us to get started.

Meet Advantage EX blogger Roger Rintala, Sr. Product Marketing Manager, HPC & AI

A member of HPE’s High Performance Computing Product Marketing team, Roger has led marketing programs for HPC/AI products, HPE HPC Software, and Arm in HPC. Roger’s background includes HPC hardware and software marketing, industry marketing for engineering and science, alliance management and product marketing from supercomputing to departmental HPC. In his spare time Roger is an enthusiastic coach who loves sharing his lifelong passions, welcoming and developing cyclists and skiers of all ages.

Advantage EX Experts
Hewlett Packard Enterprise

twitter.com/hpe_hpc
linkedin.com/showcase/hpe-ai/
hpe.com/info/hpc

Categories

Company

Local Language

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Achieve GPU-level performance with high bandwidth memory for CPUs

AdvEXperts

Author

Kudos