New insights and recommendations: Scaling AI and ML in hybrid cloud

TechExperts · ‎03-15-2022

Gain real-world insights on expected challenges and essential steps to successfully building and managing AI and ML workloads in the hybrid cloud. In a conversation hosted by Pat Moorhead, with Moor Insights and Strategy, HPE GreenLake’s Alaric Thomas and NVIDIA’s Srikanth Vijayaraghavan share what organizations are experiencing and how HPE and NVIDIA can help.

It’s not bold or surprising to say that the future belongs to hybrid cloud. In fact, you could go a step further and say that hybrid cloud is the de facto standard for many enterprises today. The interesting truth behind these statements is that some enterprises haven't necessarily kept up and are doing only certain workloads – particularly AI workloads – in the public cloud. But that’s about to change, as artificial intelligence (AI) and machine learning (ML) find their place in the hybrid cloud and extend to the edge.

This topic deserves exploring. And that’s exactly the crux of the discussion during a recent video podcast hosted by Pat Moorhead, with Moor Insights and Strategy, as he talked with Alaric Thomas of HPE GreenLake and Srikanth Vijayaraghavan of NVIDIA. You can read the highlights and key takeaways here, based on what Alaric and Srikanth are hearing from organizations and data scientists – and from what they’ve gained from their own experiences. Then, if you want to dig deeper and learn more, you catch the entire podcast on demand at the end of this blog.

3 issues affecting how AI and ML are evolving

Costs always matter. In particular, organizations are keenly interested in how costs can be more transparent when it comes to AI and ML services. Here’s what often happens: You pick up an AI/ML service that seems well priced, with a clear idea of how you’ll consume the services. But along the way, you discover multiple other services that have to be bundled in to truly meet your end-to-end use case. This kind of surprise is something you clearly want to avoid.
Data lineage and data governance are key throughout the process. Today, data scientists have access to great tooling that goes from model building, investigation, training, deploying, testing, and use cases. But at the end of the data science workflow, questions arise: How accurate is the model? Are your predictions now drifting from what's actually happening? How did your model come up with those predictions? And how did you get those numbers? Organizations are definitely seeking ways to answer these questions.
Resource management is essential. You need to make the best use of resources available to you on data science platforms. Doing so takes you exploring in two directions. One: You have more data science workloads that you want to run than you have GPU resources. So how do you split up a GPU resource to support multiple workloads at any one time? Two: You have a model to train, and that model won't fit or the training data won't fit in the memory of one GPU. So how do you parallelize GPUs and the memory associated with them to train a model at higher levels of accuracy? Organizations are actively looking for solutions here.

5 areas to address when building the optimal AI/ML infrastructure

Managing the infrastructure. Historically, in terms of workloads, AI has always been thought of as the supercomputer sitting somewhere in the corner of the data center, granting numbers. Yet today, AI is pervasive in enterprise workloads, which sets up the expectation that it should be mainstream and using the same servers used for any other enterprise workloads. So the question arises: How can you manage this as one unified entity and not have specialized infrastructure sitting somewhere? This quickly makes ITOps a priority.
Starting the AI journey with confidence. The reality is it’s not always easy to get going with AI. You can search on your own for tips and information. But it’s also good for companies with expertise to share what they know, offering insight on what is needed to get started right away – so you can go more quickly from experimentation to production.
Focusing on the hardware infrastructure. To run AI and ML, organizations need a high-performance infrastructure, ideally one that is fine-tuned to your exact needs and therefore eliminates the guesswork.
Extending DevOps methodology into MLOps. Bringing specific aspects of the DevOps model into what’s now referred to as MLOps creates the right layer to sit on top of the hardware and management orchestration layer and bring everything together.
Using containers to execute the AI workflow. Having the right container strategy in place adds to the layers – so you have the hardware management layer and management orchestration layer, a layer to manage the clusters, plus the right AI apps libraries to sit on top of that. This all adds up to an optimal infrastructure.

Be committed to moving ahead with AI and ML – knowing that complexity is part of the journey

Wherever you look in the data science world, you’ll see complexity in the AI and ML workflows and pipelines, coupled with the reality that everything is constantly changing. The velocity of change is impressive and oftentimes overwhelming. In the case of some tools, a new release comes out every three-to-four months. Every time a change rolls through, it can have ramifications through the complete technology stack – from storage and compute all the way up to the components that you thought you were happily running on the previous version.

Consider this illustration of how quickly things become more complex: Say you’re ready to deploy models at the edge. There’s no silver-bullet tool ready and waiting for you to pick up and get going. So you might decide to base this model on a training model. You’ll put it through a test cycle then deploy it at the edge, managing the lifecycle there. Or you may choose to do additional training at the edge, where again there are no ready-and-waiting technologies to help you do that. Or maybe you have deployment requirements that require you to opt for a custom build. The work that's needed to just put the architecture in place is significant, requiring research of the various components and making sure the components all work together in a consistent stack. And oh, you then have to address the GPUs correctly. Once you have done the design and have it running, you have to continuously keep it up to date.

So every step of the way, you’re dealing with tasks that are definitely complex. But the good news is you can simplify with a pre-integrated, tested, and fully managed stack that can be consumed from edge to cloud.

That’s why planning is so important – and where HPE GreenLake and NVIDIA can help

AI and ML capabilities from NVIDIA integrated with HPE GreenLake bring all the right elements together to help you explore, experiment, scale, and evolve the right solution for your data science workloads. Watch the video podcast here to catch the complete discussion on AI, ML, and the future of edge and hybrid cloud. Learn more about the innovative technologies and solutions from HPE and NVIDIA – and what they can do for you.

Join HPE at NVIDIA GTC

In our March 23 session at GTC, we’ll be examining the challenges enterprises are facing as you explore AI – with a focus on finessing your solutions and deploying them at scale through a platform that connects multiple edges to clouds via a solution like HPE GreenLake. Register for free for GTC today and we’ll see you there!

Insights Experts
Hewlett Packard Enterprise

twitter.com/HPE_AI
linkedin.com/showcase/hpe-ai/
hpe.com/us/en/solutions/artificial-intelligence.html

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

New insights and recommendations: Scaling AI and ML in hybrid cloud

3 issues affecting how AI and ML are evolving

5 areas to address when building the optimal AI/ML infrastructure

Be committed to moving ahead with AI and ML – knowing that complexity is part of the journey

That’s why planning is so important – and where HPE GreenLake and NVIDIA can help

Join HPE at NVIDIA GTC

TechExperts