Advancing Life & Work
1833894 Members
2025 Online
110063 Solutions
New Article
Labs_Editorial

Heterogeneous Serverless Computing Retrospective

HSC blog.jpg

 

Guest blog by Dejan Miljocic, HPE Fellow and VP in the Systems Architecture Lab at HPE Labs

Four years ago, just as the Corona pandemic started to lift, seven technical leaders from HPE Labs came together in a Palo Alto backyard to discuss technology trends. The group observed an increasingly heterogeneous computing landscape and, as a result, formed a program named Heterogeneous Serverless Computing (HSC) to address this opportunity.  

Together, we adopted a serverless programming model to Dejan MilojicicDejan Milojicicmatch hardware heterogeneity with the matching software abstraction. Our hypothesis was that matching the fine granularity of accelerators with that of serverless will enable us to better utilize hardware. Fine granularity is reflected in the short time to execute (service lifetime) and size of deployed code (kernels running on accelerators vs. small functions-as-a-service).   

The program was originally defined as, “Workflow-optimized Heterogeneous Serverless Computing (HSC) Architecture inclusive of (compatible with) the public Cloud, for a set of workloads broader than what the public Cloud can support.” We applied the HSC program to both HPE Private Cloud and HPE products, such as storage and interconnects.   

Research was launched at all levels of the stack, starting from workflows and programming languages to runtime, schedulers, system software, and hardware. We explored tools in performance prediction and evaluation, design space exploration, applied AI, and digital twins. At the start, there was no GenAI nor Agentic AI. Over the years, they were adopted as they appeared, primarily as workloads for which we optimize infrastructure.  

Throughout the program, we focused on the following components (see Figure 1 below, red text) 

  • Started with HPC and AI workflows that we optimize for execution on heterogeneous infrastructure. One optimization technique involved using programming languages and runtimes to improve performance and energy efficiency.    
  • We then used schedulers to map applications to appropriate hardware in an effort to improve throughput or latency, and consolidated hardware to minimize energy consumption and enable stranded power.    
  • Next, we managed accelerators to enable direct (peer-to-peer) communication between GPUs, FPGAs, SmartNICs, CPUs, and storage. We partitioned accelerators to increase their efficiency, especially for inference. Some operators or functions, we offload to SmartNICs.    
  • We developed tools to enable performance evaluation as well as performance prediction to help us with improved scheduling and optimal (right-)configuration. We also conducted design space exploration to guide our product managers, determining which features to enable in our products for our customer workloads.    
  • Across the stack, we applied AI, such as for predicting storage traffic, in recognizing optimal configurations, in predicted failures, etc. We also built federated digital twins which we used for anomaly detection in power delivery.

Targeted Users  

With HSC, we targeted the following users (see Figure 1 below, top): 

  • End users, especially those using HPC and AI, were provided seamless scalability and fluidity of new applications delivered differently.   
  • Developers were empowered with enhanced productivity.  
  • Operators and providers of infrastructure were enabled with increased efficiency to profitably run workflows and applications.

Figure 1: Heterogeneous Serverless Computing architecture in action.Figure 1: Heterogeneous Serverless Computing architecture in action.

 

 

Results

Four years later, we've been able to leverage heterogeneity in a variety of ways, while serverless remains a promising but unfulfilled area of research. Applying serverless computing to Agentic AI is especially promising, but as of now, it’s not yet a proven opportunity. The team achieved the most success in the following areas:   

  • Performance evaluation: Heterogeneity makes performance noisy, so we wrote an open-source tool (SHARP) that makes it easy to measure, reproduce, and even optimize and predict performance variability.  
  • Energy efficiency: Energy is a very difficult metric to monitor in programming frameworks. Most applications run in virtualized environments, and in many cases, the underlying infrastructure is shared among either multiple users or applications. Therefore, custom models were built that leverage monitoring data from existing HPE technologies like iLO and other open-source software to accurately predict the energy consumption of incoming requests to a given serverless platform. This way, a more energy-efficient scheduling for serverless platforms was enabled.  
  • Heterogeneity-awareness and -hiding: We leveraged benefits and overcame complexities of heterogeneous systems. Our schedulers can select the optimal accelerator (or its partition) for execution subject to the performance and/or efficiency goals. Heterogeneity is exposed to users and developers so that they can benefit from it, and it’s hidden if the lower levels can take better advantage of heterogeneity information.   
  • Large-scale design space exploration: Infrastructure for large-scale performance measurements is not widely available, so we built a hybrid performance simulator. This mixed conventional network simulation and statistical performance models for specific functions of AI workloads, enabling accurate performance simulation and infrastructure sizing for large-scale AI workloads.  

Over the course of the program, the team grew from zero to 10 team members (at its peak) and approximately that many interns/postdocs. Two dozen papers were published in conferences and journals, and close to 100 U.S. and international patent applications were filed, 9 of which were granted or allowed, so far. More importantly, we contributed to several of HPE’s business units and products, as described next.  

Impact  

  • Right-configuring and right-sizing: We developed working prototypes using machine-learning and clustering algorithms to compute the most relevant configurations, helping simplify the portfolio size and suggest the most appropriate configurations for a given customer.  
  • Server consolidation: We built models that can predict incoming requests for services, enabling the ability to compute the precise number of servers required to serve the requests and hence only provision the required number of servers and shut down the rest. The process enables customers to reduce their energy consumption and carbon impact.  
  • HeteroBench contribution to SPEC: We developed a suite of benchmarks for running programs on heterogeneous accelerators using diverse programming models, such as Python Numba, C++, OpenMP, CUDA and Vitis HLS. This work was selected as one of the best papers at ICPE, and its artifacts were accepted for accelerated adoption in SPEC benchmarks. Through HeteroBench, we are enabling the broader HPC and AI communities to evaluate their programs across heterogeneous hardware and programming environments.   

Looking Ahead 

The HSC program has made significant strides over the past four years. Key achievements include the development of open-source tools for performance evaluation, energy-efficient scheduling, and large-scale design space exploration, as well as the application of AI to optimize workflows and enhance system intelligence. These efforts have not only contributed to technical innovation but also have had a direct impact on HPE’s products and services. While serverless computing holds immense potential—especially in Agentic AI—the field remains in active research, with challenges yet to be fully realized. As the program continues to evolve, future developments will focus on refining these innovations and exploring new opportunities. 

Stay tuned for the second installment of this article, where we’ll dive deeper into the HSC program’s applications.  

 

About the Author

Labs_Editorial