Servers: The Right Compute
cancel
Showing results for 
Search instead for 
Did you mean: 

How HP and Red Hat built a better Linux

msemadeni

Blog Contributors:

 

Scott J Norton, Master Technologist leading the Linux kernel performance effort at HP    

Tom L Vaden, Distinguished Technologist working on Linux kernel strategy for HP servers            

Vinod Chegu, Master Technologist leading the Linux KVM virtualization efforts for HP

 

 

Our efforts began several years ago with Project Odyssey. The goal was simple: Make Linux more like UNIX.

 

With the introduction of Intel “Ivy-Bridge” processors existing 4 and 8 socket systems have had a 50% increase in total core counts over previous processor generations. In preparation for this HP has spent the past year and a half looking at scaling and performance optimizations for the Linux kernel. During this period the HP Linux Kernel Performance team has been meeting and collaborating with the Red Hat Performance team on a regular basis.

 

The greatest area of concern when increasing the total system core count by 50% is the effect of cache-line contention. Such a large increase in core countresults in additional pressure on kernel synchronization primitives such as spinlocks and mutexes. This additional pressure results in increased cache-line contention. Increased cache-line contention results in longer spinlock and mutex acquisition times, which in turn slows down all other kernel operations. Cache-line contention between processors is even worse - contention between one core in each of four processors is 75% worse than contention among four cores within the same processor.

 

Normally in situations like this, with a 50% increase in core counts, OS developers would look at breaking up existing locks into finer grained locks to reduce lock contention and therefore reduce cache-line contention. HP has taken a different approach – spend time optimizing the Linux synchronization primitives for large-scale NUMA systems to minimize cache-line contention.

 

HP has submitted over 125 performance related patches to the upstream Open Source Linux kernel. The majority of these patches have been focused on reducing the number of atomic instructions in the Linux kernel synchronization primitives to avoid cache-line contention. Mutexes in particular have received major attention with the reduction of atomic instructions, introduction of queued spinning, slow path optimizations, unlocking a mutex without acquiring a wait lock, and other related changes. These changes to the mutex synchronization primitives have shown a greater than 2x performance improvement with various workloads on systems with 8 and more processors.

 

Additional changes have been applied to R/W Semaphores as well as a new design for lock-less updates of the dcache reference count to minimize cache-line contention and optimize performance for large-scale NUMA systems. In the future additional changes will be introduced with queued spinlocks and queued R/W locks.

 

Along with these kernel synchronization primitive optimizations HP has also focused on large-system scaling with the System V Semaphore, Shared Memory and Message Queue implementations inside of Linux. Additional efforts in the area of the scheduler’s idle balancer have resulted in 50% improvements with certain workloads.

 

All of these performance and scalability changes will have a positive effect on 2-processor and 4-processor systems, but they really shine on systems with 8 and more processors.

 

HP has been collaborating with the Red Hat performance team throughout the development of these performance features. Both HP and Red Hat are proud to announce that all of these features have been incorporated in the Red Hat Enterprise Linux 7 release to provide an optimized kernel for large-scale NUMA systems.

 

Additionally in the area of performance, Automatic NUMA balancing, introduced in Red Hat Enterprise Linux 7, helps improve out-of-box performance for different workloads. It does this by attempting to automatically detect and move tasks and/or memory in use, closer to each other, thereby avoiding expensive remote NUMA node accesses. In most use cases (i.e., systems with up to 4 sockets) performance gains fall within a few % of the performance that can be achieved via optimal manual binding. HP has been working closely with Red Hat to help evaluate and influence this new feature in Red Hat Enterprise Linux 7. HP and Red Hat have ongoing collaborative work in this area to help make this feature perform better on larger systems with complex NUMA topologies. One such example is the Automatic NUMA Balancing presentation from the Red Hat Summit in April 2014.

 

In addition to the performance and scaling features, HP has also collaborated with the upstream Open Source community and Red Hat engineering to provide a number of enterprise-focused changes to Red Hat Enterprise Linux 7. Some of the most important examples of this collaboration are: fully supporting UEFI environments, enabling hot-plug for virtual environment sizing, and PCI error handling.

 

UEFI is likely to be the predominant server platform firmware going forward. HP has leveraged long-term UEFI expertise to help complete the UEFI environment ensuring that Secure Boot work seamlessly. When it comes to resources, virtual environments need to be dynamic available and flexible. HP has been instrumental in enabling hot-plug capabilities for CPUs and memory. Enterprise-class customers require more robust error handling in the I/O space, and HP has contributed code to improve Linux PCI error recording and recovery.

 

While we are proud of our efforts to date we are also eager to see what else we can do in the near future with Red Hat Enterprise Linux 7.

About the Author

msemadeni

Comments

Very interesting post. Do we actually have a white paper describing these enhancements & HP's efforts that we can share with customers?

 

Its nice to know that every patch goes upstream and that joined HP abd RH research can benefit everyone, from i7 desktop users to proliant server admins. One big pitfall for enterprise users however is that features like THP and NUMA gets disabled because of performance issues with Oracle and their advisories about turning them off.

Anyway, Im very happy with Proliant performance under RHEL6 and hoping it will get even better even without manual tweaks like disabling balanced performance mode. HP should also take more care about system firmware compatibility with the linux kernel, few obvious acpi bugs on dl380p gen8 and a DMAR nasty one. I dont use them for virtualization so no biggy for me but I will report them regardless as soon as I can.

 

All I can say is keep up the good work, thrilled to see time, effort and resources dedicated to supporting hp hardware under Linux.

stealthfire

Red Hat Enterprise Linux 7 has Containers and Docker under technology preview, how are HP & RH working to gether to support the number of containers Project Odyssey could support ?

msemadeni

Thank you for your interest and sorry for the delayed response.  Containers are under investigation now, but nothing to share publically - stay tuned!  Are you thinking of this from the perspective of scale-up (DL980) or scale-out (DL380)?

Events
June 18 - 20
Las Vegas, NV
HPE Discover 2019 Las Vegas
Learn about all things Discover 2019 in  Las Vegas, Nevada, June 18-20, 2019
Read more
Read for dates
HPE at 2019 Technology Events
Learn about the technology events where Hewlett Packard Enterprise will have a presence in 2019.
Read more
View all