Servers & Systems: The Right Compute
1821808 Members
3204 Online
109638 Solutions
New Article
ComputeExperts

Achieve exascale performance with HPE Cray Operating System

Not all operating systems are made equal. Discover the HPE Cray Operating System — a suite of high-performance software designed to help you get the best performance for your supercomputing applications.

HPE-Cray-Operating-System.png

By Leslie Tung, Director, HPC Systems Software

Whether it’s handheld or housed in a large data center, a computer require operating system software. It manages the hardware and file systems and also provides the baseline capabilities for running end-user workloads.

For high performance computing, Linux is the dominant operating system. HPE offers customers several standard open-source distributions of Linux. But that isn’t all. Applying the knowledge derived from a long history of Cray supercomputing, we offer HPE Cray Operating System — a suite of high-performance software designed to get the best performance from supercomputers with HPE Slingshot.

HPE-Cray-OS-elements.png

Who better to tell us more about this unique software than the HPC architects who have been developing it and deployed it to power the first exascale system on the planet?

In the following video, my colleague Des Albert interviews Andy Warner, chief systems architect for HPC and Larry Kaplan, chief software architect for HPC, on what makes HPE Cray OS special. They deep dive into some of the unique features of the software and answer a very important question — what benefits would mainstream HPC customers gain from deploying the HPE Cray Operating System?

Hasn’t the open-source community done enough?

Why does HPE offer an operating system specialized for HPC? In many ways, HPC has been setting the stage for years for how Linux behaves at scale. As Larry Kaplan explains in the interview, “the community has provided great functionality, but there have been some challenges at scale that require some additional work.” And it’s why HPE Cray OS offers a set of very specific enhancements providing an efficient basis on which to run HPC applications.  

We can split the enhancements into two categories: features improving workload performance and functionalities aimed at operational efficiency of the systems.

Improving workload performance

HPE Cray Operating System played an important role in achieving the first exascale result on an HPE Cray EX system — Oak Ridge National Lab’s Frontier supercomputer.

"In fact, we saw double-digit improvements when we first started to exploit some of the specific features," says Andy Warner.

Features such as CPU assignment which constrains OS operations to specific Linux hardware threads eliminating noise from the remaining CPUs so that applications have dedicated CPUs for a better workload performance were used to great effect. 

Other functionalities have been developed, and several features from the community have been enhanced for better ease of use.  For example, XPMEM, which allows processes in an application to communicate without taking system call overhead for the transfers, is available in the community but has been made more robust in our software suite.

Another important feature which can dramatically improve performance of HPC applications are additional Huge Page sizes, which provide page sizes larger than the Linux huge page sizes at 2 MB and 1 GB.  Enabling the larger Page sizes delivers less latency by avoiding address translation cache misses in limited sized network address translation hardware.  This is important for applications with large memory footprints and sparse reference patterns.

Additional-Huge-Pages.png

Going beyond standard Linux capabilities

The HPE Cray Operating System goes beyond application performance, addressing operational issues such as stability and debugging.

Managing memory and monitoring out-of-memory (OOM) events is an important task for diskless supercomputer systems. HPE Cray Operating System includes a tool that monitors and alerts on OOM events.  In addition, the feature helps administrators choose which processes to terminate when they find themselves in the out of memory situation so they can keep the systems stable.

Power management is emerging as a key capability for running applications at scale. While system management software manages power at the hardware-level today, HPE Cray OS offers fine-grained monitoring of power usage of the running applications so users can decide whether they want to contain or exploit available power resources and make sure their system runs efficiently, delivering on their larger goals.  Direct monitoring of power and energy consumed by each CPU or GPU socket is provided.

Feeling like this blog is an exhaustive list of all the features HPE Cray Operating System offers? It's not the case! Download the infographic "Achieving Exascale Performance with HPE Cray Operating System" to learn more about main functionalities and benefits of the software. 

What happens next

What plans do our software architects have to future-proof our operating system offering and make it attractive for our mainstream customers?

"We want to maintain our differentiation, but we also recognize that with the larger market that HPE addresses, not everyone wants to run what essentially appears to be a proprietary operating system," says Kaplan. "They are constrained to running standard distros. But we want to be able to bring some of the functionalities of Cray Operating System to those customers."

This means we’re investigating how to best offer functionalities of the HPE Cray Operating system to the standard Linux distros using kernel modules mechanism. We also continue to work closely with the open-source community and plan to release some of our network-based software as open source to incentivize further development and mainstreaming.

And of course, there are capabilities such as containerized workflows. "There are many opportunities for hybrid environments that allow us to help our customers to exploit these capabilities in a way that best feeds to the operational model they want to deploy," says Warner.

This means that our users could, for example, have HPE Cray Operating System as a base OS for stability, debuggability, and performance with containerized workloads using a standard distro of their choice running above it.

Not everyone requires an exascale system, but we want to ensure exascale technologies — and the accelerated discovery and innovation they enable — are available to all.

To learn more about HPE HPC software solutions:

Visit hpe.com/info/hpc-software

View the HPE Cray Operation System infographic


Leslie Tung-HPE.pngMeet Leslie Tung, Director, HPC Systems Software 

Leslie leads the HPC Software Product Management team at HPE responsible for managing a portfolio of HPE-engineered and third-party software for the HPE Cray EX supercomputers and HPE Cray XD HPC systems. The software portfolio includes system management and operating system software enabling system resiliency and DevOps software tuned for HPC and AI workloads. Connect with Leslie on LinkedIn. 


Server Experts
Hewlett Packard Enterprise

twitter.com/HPE_HPC
linkedin.com/showcase/hpe-servers-and-systems/
hpe.com/servers

0 Kudos
About the Author

ComputeExperts

Our team of Hewlett Packard Enterprise server experts helps you to dive deep into relevant infrastructure topics.