Get the most from Spark and help ignite a revolution!

LolaTam · ‎09-15-2021

Do you want to learn how to retire the elephant (Hadoop)? Have you considered that Spark is the solution to help you do that? If you answered yes to either of those questions, you won’t want to miss the recent article by HPE’s Randy Thomasson, Optimizing Compute in the Post Hadoop Era.

This article, appearing in The New Stack, is Randy’s second published article about the post-Hadoop era. The first, Is There Life After Hadoop?, described two key strategies for the transition: build a better lake and optimize the compute. This second article goes into detail about how to optimize the compute part of the equation by using Apache Spark. I highlight his key points below.

Spark: Well-suited for today’s needs

Randy begins by explaining the popularity of this solution. “Apache Spark’s flexibility, columnar approach to data, suitability for artificial intelligence (AI) and machine learning, and its vastly improved performance over Hadoop have all served to dramatically increase its adoption in recent years. For most users, it has become the logical successor to Hadoop MapReduce. This article addresses how to get the most from Spark and help ignite a revolution.”

According to Randy, Spark clusters are well suited for the needs of today’s data-driven business. Its support for streaming and in-memory processing can provide substantial performance improvements over more batch-oriented technologies such as Hadoop. Spark’s cluster-based architecture also allows it to handle a wide variety of data sets.

Choosing the deployment

Spark directly supports four different cluster managers (Standalone, Apache Mesos, Hadoop YARN, and Kubernetes). The choice of Spark cluster managers varies although many organizations are now switching from Hadoop YARN to Spark on Kubernetes.

That’s because Kubernetes offers some distinct advantages for Spark deployments. According to the article, the most important reason is its support for containers. Containers have revolutionized the way that applications are packaged and deployed much like virtualization revolutionized server infrastructure. Containers provide better isolation, improved portability, simpler dependency management, and dramatically reduced application cycle times.

Kubernetes also provides more efficient resource management, eliminating the need for transient clusters. The shorter application iteration cycles and significantly less setup/teardown delays provided by Kubernetes translate to substantially lower lifecycle costs. As a result, organizations moving their Spark workloads to Kubernetes typically see 50% to 75% lower costs.

Retiring the elephant

Many organizations are choosing to retire the elephant, and Spark has emerged as the tool of choice to replace it. Spark’s improved performance, affinity with existing Hadoop assets, and its more advanced approach to data make it a popular choice for migrating Hadoop workloads.

Yet, Hadoop will be with us for a while. Given this reality, organizations migrating from Hadoop need a solution strategy that provides a cost-effective home for their remaining Hadoop assets, while at the same time accommodating growing Spark workloads. Ideally, the solution would support the compute and storage needs of existing Hadoop assets as well as newer Spark workloads, while minimizing both the number of runtime platforms and associated storage.

Igniting a revolution with Spark

We’ve seen an explosion in AI and data-driven applications in recent years, which has driven the migration from Hadoop and fueled the adoption of Spark and machine learning technologies. Organizations need an approach that will allow them to effectively manage their shrinking Hadoop investment while increasing their investments in Spark and machine learning technologies.

According to Randy, the best way to do this is to embrace a Spark-plus-data fabric strategy for analytics. By adopting HPE Ezmeral, organizations can ease their transition into a post-Hadoop era, optimizing analytics compute functions with Spark while effectively managing legacy Hadoop assets in the process.

Folow this link to access the complete article: Optimizing Compute in the Post Hadoop Era.

Lola

Hewlett Packard Enterprise

HPE Ezmeral on LinkedIn | @HPE_Ezmeral on Twitter

@HPE_DevCom on Twitter

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Get the most from Spark and help ignite a revolution!

LolaTam