- Community Home
- >
- Software
- >
- HPE Ezmeral: Uncut
- >
- Optimize your big data environment with the RET pr...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Receive email notifications
- Printer Friendly Page
- Report Inappropriate Content
Optimize your big data environment with the RET principle
When big data environments are large, complex, and siloed, IT teams spend more time managing, patching, and upgrading systems instead of helping the business solve problems with advanced analytics. But if we can think about the infrastructure in terms of workload using the RET principle, we can gain greater utilization of the infrastructure and keep up with the demands of the business. Watch this lightboard video to learn about the RET principle.
Is your existing big data environment getting out of hand? Is it difficult to manage? Do you struggle to keep up with the demands from the business? If you answered yes to any of these questions, youโre not alone.
Big data infrastructure challenges
A common challenge I encounter when I speak to customers about their existing big data environments is โHadoop Sprawl.โ What does that mean? Identical to the database data sprawl issues of the 2000โs, it means that there are far too many Hadoop clusters throughout the enterpriseโwhich lead to the proliferation of silos. By having disparate environmentsโsuch as a lab, dev, UAT/integration and production environmentsโIT teams spend a majority of their time managing, patching, and upgrading these systems instead of spending that time working with the business to solve difficult problems through advanced analytics.
The bigger problem with having multiple unique environments, however, is data duplication and the inefficient use of infrastructure funds that are used to support these environments. Data duplication across the environments creates a problem of data drift, meaning the data doesnโt stay up to date with the system of reference data. As a consequence, analytics and data science teams get inconsistent results from the different systems, and inefficiencies in the infrastructure hinder productivity as well as erode confidence in the results from their work.
If this problem is so common, cutting across organizations of all sizes and all industries, how is it that we got here? The answer lies in the rapid change of technology in this space outpacing ITโs ability to adapt. The traditional Hadoop stack used to be pretty simpleโHDFS for storage, MapReduce for processing, and a few databases for surfacing the data to a limited set of applications. Over the past decade, the number of additional components in the Hadoop stack as well as databases and processing components has exploded. This means that the traditional monolithic tightly coupled architecture of these applications (i.e. deploying them all together) is no longer the most efficient way to deploy these systems. The problem with deploying these systems as monoliths is that every time one of the components needs to be patched or upgraded, the whole system must also be redeployed. As the system becomes bigger and more complex, this process becomes time-consuming and error-prone, resulting in updates happening less and less frequently (or alternatively, resulting in more downtime for the end-users).
A workloads perspective
Whatโs the alternative? Looking at the world of cloud-native application development, we can take the pattern of decoupling the components of the monolith in a way that is similar to how modern applications are composed of microservices. We donโt need to break apart the system in quite the same way as stateless microservices-based applications, but we can take the concept of decoupling the monolith with the major components. Applications like Spark, Kafka, and Hive can all be independently deployed and scaled based on the needs of the organization. Additionally, by separating the application/compute components from the data/storage components, we can independently scale them.
When deciding which components to decouple, we need to use a set of criteria to determine if they are candidates for this process. I like to use the RET principle for determining if a big data job or application is a good fit:
- Restartable: If the job fails, can we restart it without affecting the other users or system?
- Ephermeral: Is the application created and destroyed on demand; is it short lived?
- Temporal: Does the job have a well-defined run time?
This is where thinking about the infrastructure in terms of workloads, instead of monolithic silos, can allow us to get greater utilization of the infrastructure and to keep up with the demands of the business. Check out my video to learn more about applying the RET principle to optimize your big data environment and boost productivity.
Learn more about HPE BlueData Software.
Matt Maccaux
Hewlett Packard Enterprise
twitter.com/HPE_AI
linkedin.com/showcase/hpe-ai/
hpe.com/us/en/solutions/artificial-intelligence.html
- Back to Blog
- Newer Article
- Older Article
- SFERRY on: What is machine learning?
- MTiempos on: HPE Ezmeral Container Platform is now HPE Ezmeral ...
- Arda Acar on: Analytic model deployment too slow? Accelerate dat...
- Jeroen_Kleen on: Introducing HPE Ezmeral Container Platform 5.1
- LWhitehouse on: Catch the next wave of HPE Discover Virtual Experi...
- jnewtonhp on: Bringing Trusted Computing to the Cloud
- Marty Poniatowski on: Leverage containers to maintain business continuit...
- Data Science training in hyderabad on: How to accelerate model training and improve data ...
- vanphongpham1 on: More enterprises are using containers; hereโs why.
- data science course on: Machine Learning Operationalization in the Enterpr...