Advancing Life & Work
1826431 Members
4062 Online
109692 Solutions
New Article
HPE-Editor

What ML Engineers Should Know About Hardware

What ML Engineers Should Know About Hardware

  • Monica Livingston, Sr Director, AI and Accelerated Compute Solutions

The talent gap in AI has been a popular and broadly discussed topic in the past several years. As AI-based applications make inroads into virtually every industry, we need more data scientists, ML engineers and AI developers. Much of the AI upskilling up to now has been focused on software but as more and more AI projects fail to make it to deployment because of cost overruns, the industry is taking a harder look at how to take maximum advantage of existing hardware infrastructure when deploying AI models. Intel and HPE can help bridge this critical gap in the industry.

AdobeStock_222468625_1600_0_72_RGB.jpg

Why are we talking about hardware?

Hardware matters because it can significantly increase the cost of a project. Because velocity and time to solution is critical, there is often little time to optimize an application for the underlying hardware. In many cases this results in accruing significant technical debt because we’re deploying more hardware than is needed and we end up with underutilized resources. This is also one of the top reasons why POCs don’t make it to production. When we spec out the cost of unoptimized hardware especially if we’re deploying at scale, we’ve exceeded our budget and we’re getting into the space where the benefits of the AI solution don’t justify the cost. Intel® builds AI functionality across our hardware portfolio from DL Boost capability in our Xeon Scalable Processors to discrete GPUs. HPE® offers a portfolio of infrastructure products built upon our chips to make it easier and faster to deploy AI.

How can software engineers use existing hardware to go faster?

My team spends a lot of time on infrastructure optimization and model optimization with customers. Our methodology is straightforward: (1) Run a trained model on existing infrastructure alongside the usual set of workloads. In the cloud, benchmark across a set of instances. (2) From this initial data, assess whether the performance is good enough and (3) If performance or latency requirements are not met, then determine if more memory, more I/O bandwidth or more compute is be needed. This way we’re never recommending hardware that isn’t needed or may be underutilized.

Cost savings are also achieved by optimizing software. Infrastructure is complex and software is an extremely powerful lever when trading off cost, performance, latency and bandwidth. We’ve seen performance improvements from 10x to 100x on the same hardware by just optimizing the software.

Intel offers tools that optimize all layers of the system software infrastructure from operation systems to applications, including libraries, industry frameworks, and tools. Our goal is to simplify and accelerate the development of end-to-end solutions across the entire data analytics and AI pipeline from ingest to insights.

What steps can IT take toward AI-readiness?

AI is a disruptive technology and will require process changes. Many companies are establishing AI Centers of Excellence, centralized hubs for sharing best known methods, standardizing tools and developing AIaaS offerings. IT plays a critical role in selecting among the myriad of development tools, testing them and providing the most cost-efficient infrastructure at each stage of the AI development and deployment process. Understanding how these AI tools and workloads interface with hardware is critical in selecting the appropriate infrastructure. For example, data science workloads are often designed for single node processing and require systems that are highly interactive and can handle massive data sets while minimizing latency. Open standards are important in cross-platform interoperability as the end-to-end AI development process takes place across different platforms.

How can Intel and HPE help?

Intel and HPE are working on a number of products to help improve performance, decrease time-to-solution and reduce the overall cost of an AI deployment for our customers:

  • Intel optimizes all layers of the system software infrastructure and that includes libraries and compilers, industry frameworks, middleware and tools for both machine learning and deep learning and analytics as well.
  • Intel® oneAPI delivers a unified software development environment across CPU and accelerator architectures
  • Intel® oneAPI AI and Analytics Toolkit gives data scientists optimized tools for Python and familiar frameworks to accelerate end-to-end data science and analytics pipelines
  • HPE Greenlake delivers AI, ML and Analytics outcomes faster with an edge-to-cloud platform
  • Intel and HPE both work with 3rd party ISVs to help optimize their applications for a number of different infrastructure options in order to provide flexibility, performance and cost savings

 For additional information, please refer to:

HPE Editor
0 Kudos
About the Author

HPE-Editor

Editor-in-chief for the HPE Advancing Life & Work blog.