Servers: The Right Compute
cancel
Showing results for 
Search instead for 
Did you mean: 

Project Odyssey, HP's journey of a 1,000 miles with Linux

msemadeni

Blog contributed by Tom L Vaden, Distinguished Technologist working on Linux kernel strategy for HP servers

 

 

Chinese philosopher Lao Tzu said, “A journey of a thousand miles begins with a single step.”

 

A few years ago HP began its own journey of a thousand miles with Project Odyssey, add mission critical features to Linux. Tomorrow at SUSECon 21014, Tom Vaden will deliver a session providing an update on our progress.  Since many of you will not be able to attend, the following is an overview of his presentation.

 

Over time the Linux kernel has extended its reach to cover additional varieties of platforms. This includes the changes to enhance the support of larger-scaled platforms. The part of the Linux kernel community that is interested in larger-scaled servers is relatively small. Thus, changes to enable and improve performance for those servers first must do no harm for the one- and two-socket mainstream. Likewise any significant complexity due to a change for those servers needs to have a compelling reason.

 

Large configurations of servers like the HP DL580 Gen8 and the much anticipated DragonHawk system can range from 60-240 cores (120-480 threads) with 1-12 Terabytes of memory that have non-uniform memory architectures (NUMA). Working with our distro partners at SUSE and the upstream community over the past several months, members of Hewlett-Packard's Operating Environments lab have been working to improve Linux kernel support of such large-scale servers. These improvements for larger-scaled servers have indeed been compelling and can be put broadly in two categories: performance and RAS (reliability, availability, serviceability). Kernel developers at Hewlett-Packard have submitted over 225 patches to the upstream Linux kernel for better enablement and performance of larger-scaled servers.

 

Many of the performance-related patches have been focused on reducing the number of atomic instructions in the Linux kernel synchronization primitives to avoid cache-line contention. Mutexes in particular have received major attention with the reduction of atomic instructions, introduction of queued spinning, slow path optimizations, unlocking a mutex without acquiring a wait lock, and other related changes. These changes to the mutex synchronization primitives have shown a greater than 2x performance improvement with various workloads on systems with 8 and more processors.

 

Additional changes have been applied to R/W Semaphores as well as a new design for lock-less updates of the dcache reference count to minimize cache-line contention and optimize performance for large-scale NUMA systems. In the future additional changes will be introduced with queued spinlocks and queued R/W locks.

 

Along with these kernel synchronization primitive optimizations Hewlett-Packard engineers have also focused on large-system scaling with the System V Semaphore, Shared Memory and Message Queue implementations inside of Linux. Additional efforts in the area of the scheduler’s idle balancer have resulted in 50% improvements with certain workloads.

 

Also in the area of performance, automatic NUMA balancing that can be found in RHEL7 and SLES12 helps improve out-of-box performance for different workloads. It does this by attempting to automatically detect and move tasks and/or memory in use, closer to each other, thereby avoiding expensive remote NUMA node accesses. In most use cases (i.e., systems with up to 4 sockets) performance gains fall within a few % of the performance that can be achieved via optimal manual binding. Developers at Hewlett-Packard have been working to evaluate and influence this new feature.

 

All of these performance and scalability changes have a positive effect on 2-processor and 4-processor systems also, but they are really compelling on systems with 8 and more processors.

 

Engineers at Hewlett-Packard have also collaborated with the upstream Open Source community and distro partner engineering to provide a number of enterprise-focused RAS changes to the Linux kernel. Examples of this collaboration are improvements in: crashdump function and performance, virtualization and advanced error handling.

 

Crashdump improvements included the use of mmap for vmcore handling to increase performance/scaling, the use of advanced compression techniques in makedumpfile, enabling parallelism in all phases of the kdump process and responding to performance issues in kdump's space-saving cyclic processing. Full use of some of these facilities (e.g. parallelism) will be done in the future.

 

Some of the error handling collaborations were focused on advancements in platform Firmware First cooperation, enabling advanced error analysis and enabling Xeon-EX advanced MCA recovery. Large-scale server customers require more robust error handling in the I/O space. So, Hewlett-Packard developers have also participated in improvements in PCI express Advanced Error Reporting (AER) and implementation of PCIe Live Error Recovery (LER) for some previously unrecoverable PCI express errors for bare-metal and virtualized environments.

 

Virtualization development was done to support KVM performance and scaling for large-scale servers especially in the area of migration with improvements to dirty-page bit-map optimizations, RDMA-based live guest migration and the isolation of migration resources. Additionally, work was done for virtual resource optimizations (especially for large systems) in the form of hot-plug memory and CPU support. Development of advanced error handling for VM error isolation was also done. Other KVM-based performance improvements involved things like NUMA awareness and direct I/O assignment in large machine configurations.

 

An important honorable mention of the large-scale platform work is the work that was done to fully support UEFI environments. Hewlett-Packard has leveraged its long-term UEFI expertise to help complete the UEFI environment - ensuring things like Secure Boot work seamlessly. Also, work has been done to help complete UEFI support for things like kexec that skips hardware reset and directly boots a new kernel - a function that is important for Hewlett-Packard's larger server customers.  

 

To date, the partnership of Hewlett-Packard and SUSE engineering has provided significant function and compelling performance for Hewlett-Packard server customers, and we are looking forward to what else we can do in the future.

0 Kudos
About the Author

msemadeni