Performance-enhanced deep learning models for the edge

TechExperts · ‎03-22-2022

To ensure your AI solution meets accuracy and performance requirements, you have to consider multiple factors when deploying production AI models. Read what HPE's Kenneth Leach and Deci's Sefi Kligler have to say about how HPE is working with partners like Deci to provide the expertise, people, and technology to achieve a performance-enhanced solution and accelerate business outcomes with AI.

What matters when deploying AI at the edge?

We’ve all heard the stories: AI projects are getting stuck at the “last mile” and not making it into production with the “reasons why” not always discussed. Although many factors contribute to the success or failure of an AI solution, deep learning model accuracy is broadly recognized as an important success criterion. However, accuracy is not the only important factor. After they are initially trained for the intended application, deep learning models often have compute capability requirements that are incompatible with other requirements, such as cost-effectiveness and power-efficiency, of the production solution. For example, many video-based computer vision models require fast inference results on high temporal resolution data.

Though trained to the desired accuracy, deep learning models that were developed offline in a data center may not be able to provide insights fast enough on the smaller, low-power systems that are available at the edge. In this blog, I’ll discuss AI inference metrics that are often critical to successfully transform an AI project into a successful, performance enhanced AI solution. I’ll also demonstrate how HPE, Deci, and Intel partnered to solve a real production problem.

Reducing latency, resources, and complexity

Latency and resource requirements as well as deployment complexity can hinder putting a trained AI model into production, even one that delivers the required accuracy for the problem.

Let’s look at an example using object detection. Object detection models recognize and classify objects in images that match a trained set of categories, for example, people, cats, dogs, laptops, bottles, computer components, and traffic signs. State-of-the-art (SOTA) models such as EfficientDet, YOLOv5, RetinaNet, DETR, and SSD among others can achieve high levels of accuracy which often contributes to a successful AI solution. However, poor runtime performance such as slow latency or insufficient throughput might prevent successful deployment.

Similarly, HPE and Deci recently worked with a computer vision solution provider struggling to meet performance requirements. The intent was a production object detection solution based on YOLOv5 to identify objects in city streets, but the model proved incapable of meeting the required video frame-rate performance when deployed to the target inference hardware. In this case the customer was deploying the solution on edge-specific hardware, ruggedized HPE Edgeline servers with power-efficient Intel® Xeon® processors.

While the YOLOv5 model is considered by many to be SOTA in terms of accuracy and latency, it was built for running on GPUs. HPE partnered with Deci to optimize the customer’s model using Deci’s Automated Neural Architecture Construction Technology (AutoNAC™), which in the end created a performance enhanced AI solution which exceeded all production requirements. The AutoNAC™ solution is designed for solving this problem and enhances deep learning models’ performance on a range of HPE platforms.

Here's how it works: The challenge was to improve the model’s performance on Intel® Xeon® Scalable Processors, which Deci accomplished while adhering to strict memory and accuracy constraints. One way to increase throughput is to decrease inference latency, or the time it takes for the model to provide a result for a given image. Deci’s AutoNAC™ engine delivers algorithmic level model optimizations for any target hardware based on a proprietary neural architecture search (NAS) engine. The AutoNAC™ engine needs a baseline model as input, the dataset used to train this model, and access to the target inference hardware platform to monitor model performance. The solution then identifies and removes bottlenecks in the model architecture and redesigns a hardware-optimized neural network with more accuracy, higher throughput, lower latency, smaller model size, or smaller memory footprint than the original.

In the table below, AutoNAC Optimized Performance Results shows initial test results of the unoptimized TensorFlow-based YOLOv5 model with latency of 900ms on HPE Edgeline and mean average precision (mAP) of 0.63. The AutoNAC™ solution optimized the runtime performance of the model and reduced the latency 12-fold to 70ms. The optimized model accuracy, defined by the mAP, when compared to the original also improved by 25% using Deci's SuperGradients open source training library. Once optimization was complete, the model was integrated into the image processing container using Infery, Deci's runtime engine with just a few lines of code. Final testing verified that the optimized model, when deployed on the target hardware, met the frames-per-second requirement without even optimizing any other parts of the inference pipeline. Now that is enhanced performance!

AutoNAC Optimized Performance Results

Considerations for model deployment

Understanding accuracy and performance requirements throughout the AI solution development cycle is critical for a timely and performance-enhanced solution. Accuracy, runtime performance including latency and throughput, and deployment environment are all important factors for a successful AI solution. It is critical to evaluate these factors when deploying models.

For example, no matter how accurate a model is, if it doesn’t meet inference performance requirements, then it might not succeed in production. Likewise, no matter how quickly a model can generate inference results, if it is not accurate enough, then it will not meet the success criteria of the overall solution. Why do these factors matter when deploying AI into production environments and what tools can be used to enhance performance?

Model design, accuracy, and performance

Deep Learning models are growing larger and more complex. This is partially driven by the need to reach higher accuracy rates and enable more advanced use cases. However, as model sizes increase, it is more challenging to deploy these advanced models onto edge devices. An efficient model design process which takes into consideration the inference environment and deployment hardware early on, can yield a much smaller model that will better meet resource requirements of edge devices as well as achieve the desired accuracy and performance. Once a model is deployed into production, accuracy can change over time. Understanding how these factors affect accuracy will help your solution continue to meet requirements throughout its lifecycle. Deci's SuperGradients open-source training library can be used to maintain accuracy over the lifetime of a solution.

Deployment hardware and resource requirements

In many cases, AI models are trained and tested on large high performance computing (HPC) clusters that have significant compute capacity and accelerated processors, such as GPUs, that might not be available in the edge deployment environment. For example, edge environments may necessitate less compute-intensive processors due to constraints on space, cost, power, or cooling. Choosing hardware that is designed for performance in edge environments is key for successful deployment. HPE Edgeline and HPE ProLiant Gen10 Plus platforms provide an open standards-based, high-performance, low latency system for the most demanding use cases powered by third-generation Intel® Xeon® Scalable Processors.

Time to decision

Ensuring that a production model meets latency requirements is also critical to a successful AI solution. A late prediction may be as useless as an incorrect one. To enable real-time decisions close to where data is generated, edge deployments often have very low latency requirements. To successfully put your model into production, you may have to improve its inference latency through optimization and retraining. Deci’s AutoNAC™ engine delivers optimizations for any target hardware.

Deployment environment

AI model deployment can be complex and hard to manage over time due to version control, deployment automation, and continuous integration/continuous deployment (CI/CD) development cycles. Having effective visualization, monitoring, and CI (Continuous Integration) capabilities is a requirement for MLOps pipelines. These pipelines allow organizations to resolve matters before they become future issues and maintain model confidence. It also enables corrections for model drift and delivery of updated models seamlessly over time. HPE Ezmeral ML Ops provides pre-packaged tools to operationalize AI workflows at every stage of the AI lifecycle, from pilot to production, giving you DevOps-like speed and agility.

Ready to move forward?

Multiple factors should be considered when deploying production AI models to ensure the AI solution meets accuracy and performance requirements. HPE provides the expertise, people, technology, and partners to achieve a performance enhanced solution and accelerate business outcomes with AI.

Fine more information in this solution brief.

Question? Please contact HPE at: AIAdvance@hpe.com

Meet our Tech Insights Experts bloggers

Kenneth Leach, AI Technologist & Solution Architect, HPE Kenneth has worked within HPE servers and systems engineering teams since 2006, with specialties in scalable systems, HPC, edge computing, IoT and AI solutions. He has created numerous solutions in emerging technologies during his time at HPE. He has a B.A. in Computer Science from The University of Texas at Austin.

Sefi Kligler, VP of AI, Deci Sefi was granted an M.Sc. in Math and Computer Science from the Weizmann Institute of Science. His thesis focused on Super Resolution, and image degradation estimation using deep learning. He also worked in the field of multiple view geometry, developing a helmet for aviation.

Insights Experts
Hewlett Packard Enterprise

twitter.com/HPE_AI
linkedin.com/showcase/hpe-ai/
hpe.com/us/en/solutions/artificial-intelligence.html

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Performance-enhanced deep learning models for the edge

What matters when deploying AI at the edge?

Reducing latency, resources, and complexity

Ready to move forward?

Fine more information in this solution brief.

Question? Please contact HPE at: AIAdvance@hpe.com

Meet our Tech Insights Experts bloggers

TechExperts