- Community Home
- >
- Software
- >
- HPE Ezmeral: Uncut
- >
- Get the most from Spark and help ignite a revoluti...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Receive email notifications
- Printer Friendly Page
- Report Inappropriate Content
Get the most from Spark and help ignite a revolution!
Do you want to learn how to retire the elephant (Hadoop)? Have you considered that Spark is the solution to help you do that? If you answered yes to either of those questions, you won’t want to miss the recent article by HPE’s Randy Thomasson, Optimizing Compute in the Post Hadoop Era.
This article, appearing in The New Stack, is Randy’s second published article about the post-Hadoop era. The first, Is There Life After Hadoop?, described two key strategies for the transition: build a better lake and optimize the compute. This second article goes into detail about how to optimize the compute part of the equation by using Apache Spark. I highlight his key points below.
Spark: Well-suited for today’s needs
Randy begins by explaining the popularity of this solution. “Apache Spark’s flexibility, columnar approach to data, suitability for artificial intelligence (AI) and machine learning, and its vastly improved performance over Hadoop have all served to dramatically increase its adoption in recent years. For most users, it has become the logical successor to Hadoop MapReduce. This article addresses how to get the most from Spark and help ignite a revolution.”
According to Randy, Spark clusters are well suited for the needs of today’s data-driven business. Its support for streaming and in-memory processing can provide substantial performance improvements over more batch-oriented technologies such as Hadoop. Spark’s cluster-based architecture also allows it to handle a wide variety of data sets.
Choosing the deployment
Spark directly supports four different cluster managers (Standalone, Apache Mesos, Hadoop YARN, and Kubernetes). The choice of Spark cluster managers varies although many organizations are now switching from Hadoop YARN to Spark on Kubernetes.
That’s because Kubernetes offers some distinct advantages for Spark deployments. According to the article, the most important reason is its support for containers. Containers have revolutionized the way that applications are packaged and deployed much like virtualization revolutionized server infrastructure. Containers provide better isolation, improved portability, simpler dependency management, and dramatically reduced application cycle times.
Kubernetes also provides more efficient resource management, eliminating the need for transient clusters. The shorter application iteration cycles and significantly less setup/teardown delays provided by Kubernetes translate to substantially lower lifecycle costs. As a result, organizations moving their Spark workloads to Kubernetes typically see 50% to 75% lower costs.
Retiring the elephant
Many organizations are choosing to retire the elephant, and Spark has emerged as the tool of choice to replace it. Spark’s improved performance, affinity with existing Hadoop assets, and its more advanced approach to data make it a popular choice for migrating Hadoop workloads.
Yet, Hadoop will be with us for a while. Given this reality, organizations migrating from Hadoop need a solution strategy that provides a cost-effective home for their remaining Hadoop assets, while at the same time accommodating growing Spark workloads. Ideally, the solution would support the compute and storage needs of existing Hadoop assets as well as newer Spark workloads, while minimizing both the number of runtime platforms and associated storage.
Igniting a revolution with Spark
We’ve seen an explosion in AI and data-driven applications in recent years, which has driven the migration from Hadoop and fueled the adoption of Spark and machine learning technologies. Organizations need an approach that will allow them to effectively manage their shrinking Hadoop investment while increasing their investments in Spark and machine learning technologies.
According to Randy, the best way to do this is to embrace a Spark-plus-data fabric strategy for analytics. By adopting HPE Ezmeral, organizations can ease their transition into a post-Hadoop era, optimizing analytics compute functions with Spark while effectively managing legacy Hadoop assets in the process.
Folow this link to access the complete article: Optimizing Compute in the Post Hadoop Era.
Lola
Hewlett Packard Enterprise
HPE Ezmeral on LinkedIn | @HPE_Ezmeral on Twitter
@HPE_DevCom on Twitter
LolaTam
Lola Tam is a senior product marketing manager, focused on content creation to support go-to-market efforts for the HPE Enterprise Software Business Unit. Areas of interest include application modernization, AI / ML, and data science, and the benefits these solutions bring to customers.
- Back to Blog
- Newer Article
- Older Article
- SFERRY on: What is machine learning?
- MTiempos on: HPE Ezmeral Container Platform is now HPE Ezmeral ...
- Arda Acar on: Analytic model deployment too slow? Accelerate dat...
- Jeroen_Kleen on: Introducing HPE Ezmeral Container Platform 5.1
- LWhitehouse on: Catch the next wave of HPE Discover Virtual Experi...
- jnewtonhp on: Bringing Trusted Computing to the Cloud
- Marty Poniatowski on: Leverage containers to maintain business continuit...
- Data Science training in hyderabad on: How to accelerate model training and improve data ...
- vanphongpham1 on: More enterprises are using containers; here’s why.
- data science course on: Machine Learning Operationalization in the Enterpr...