AI Unlocked
1826219 Members
2898 Online
109691 Solutions
New Article
Ka_Wai_Leung

Optimizing Spark for Cost Savings on HPE Ezmeral

Co-authored by Manuel Hoffmann at Pepperdata and Ka Wai Leung at HPE

HPE Ezmeral and Pepperdata partners.jpgHPE Ezmeral Runtime Enterprise provides all the tools needed to build, modernize, deploy, monitor, and manage a wide range of AI and analytics workloads to unleash data’s full potential. The solution is powerful, secure, flexible, and has been widely adopted to drive digital transformation and analytics. HPE also offers Spark Operator 3 as a value-added component on HPE Ezmeral, which is based on the downstream and enhanced version of Apache Spark. HPE combines the power and versatility of Apache Spark with the robust, enterprise-grade HPE Ezmeral Runtime Enterprise to support running analytics at scale against large data sources.

But what do you do once adoption scales to the point where dozens or hundreds of data scientists and data analysts are running massive amounts of Spark applications?

According to a recent survey, a third of enterprises report they exceed big data IT budgets by 40% or more. Combined with the current economic situation where everyone is being asked to do more with less, there is a tremendous need for autonomous optimization to increase productivity and lower costs. One way to address this challenge is to ensure the applications are not over-provisioned and are frugally using only the resources they need. Understanding and predicting resource usage for an application is more of an art than a science and can require trial and error. And if you run hundreds, thousands, or sometimes hundreds of thousands of applications daily, those inefficiencies can add up quickly. The answer is autonomous optimization.

HPE is partnering with Pepperdata to bring detailed observability to Spark and to deliver automated, near real-time, autonomous optimization for cluster container resources. This solution is accomplished without the need for Spark developers to change a single line of code.

Developers tend to request excess resources when submitting Spark jobs. While understandable, this trend typically leads to seriously under-utilized Spark clusters. Pepperdata found that only 29.8% of allocated resources for Spark applications are being used within their customer base.  Accurately estimating Spark resource consumption without automation is not an easy task and will result in idle resources and a costly cloud bill.

Pepperdata Platform Spotlight helps reduce idle capacity by showing both the allocated resources and the used resources for a given Spark job execution.

Pepperdata Platform.png

 This example pictured above demonstrates that very few of the allocated resources are being used.

Understanding Spark applications and the cluster resources they use is one step in the right direction. Pepperdata’s Capacity Optimizer takes it one step further as it minimizes container resource waste by autonomously tuning Spark containers in the background, completely transparently to developers. This solution frees them to focus on the thing they were hired for: developing applications to support business goals.

Capacity Optimizer pairs with the HPE Ezmeral Kubernetes scheduler, telling it how much more load each of the hosts in the cluster can handle. Usually, the scheduler considers the resource allocation parameters to see how many resources are available on a host and determines whether a host can take on more executors. If applications are using less than 30% of their allocated resources, for example, the host can typically manage many more applications.

In addition to allocated resources, Capacity Optimizer also considers used resources and other metrics to make intelligent decisions about a host’s true capacity. If it finds that a host is full in terms of allocated resources, but not full in terms of used resources, it will tell the HPE Ezmeral scheduler to schedule additional Spark executors on it.

An actual customer screenshot below shows that Capacity Optimizer provided a boost of 40% in instance hours saved, a peak container uplift of 92%, and average container uplift of 41%. This equates to significant IT cost savings.

Capacity Optimizer.png

Learn more about HPE Ezmeral and how the HPE Ezmeral ecosystem can help customers take their digital transformation to the next level for their different workloads.

For a free trial of Pepperdata on HPE, please email info@pepperdata.com. Also, check out the Pepperdata interactive demo and Pepperdata webinars and videos.

 

 About the authors:

Manuel Hoffmann Head shot HPE Blog.jpgManuel Hoffmann leads Pepperdata Partnerships and Business Development. Prior to joining Pepperdata, Manuel was Sr. Director, Strategic Alliances and Partner Development at FICO, where he created the FICO Cloud Center of Excellence dramatically reducing AWS expenses. Prior to FICO, Manuel led global business development, channel sales and marketing functions at early-stage companies. A Swiss native, Manuel holds a BS in electro-mechanical engineering from ECAM (Belgium), a degree in business administration from the University of Leuven (Belgium), and a certificate of international marketing from the University of California, Santa Cruz.

 

Ka Wai Leung is part of the HPE Software Business Unit’s Partner Enablement team. (See bio above right on this page.)

 

Hewlett Packard Enterprise

ezmeralecosystem@hpe.com

HPEGreenLakeMarketplace/vFunction

linkedin.com/showcase/hpe-ezmeral

hpe.com/ezmeral

0 Kudos
About the Author

Ka_Wai_Leung

Ka Wai Leung is part of the HPE Software Business Unit Partner Enablement team. He has extensive background in developing container solutions throughout his career at HPE