- Community Home
- >
- Solutions
- >
- AI Insights
- >
- HPE Elastic Platform for Analytics: Why infrastruc...
-
- Forums
-
Blogs
- Hybrid Cloud
- Edge
- Data & AI
- Working in Tech
- AI Insights
- Alliances
- Around the Storage Block
- Behind the scenes at Labs
- Careers in Tech
- HPE Storage Tech Insiders
- Inspiring Progress
- IoT at the Edge
- My Learning Certification
- OEM Solutions
- Servers: The Right Compute
- Shifting to Software-Defined
- Telecom IQ
- Transforming IT
- HPE Blog, Austria, Germany & Switzerland
- Blog HPE, France
- HPE Blog, Italy
- HPE Blog, Japan
- HPE Blog, Russia
- HPE Blog, UK & Ireland
- Blogs
-
Quick Links
- Community
- Getting Started
- FAQ
- Ranking Overview
- Rules of Participation
- Contact
- Email us
- Tell us what you think
- Information Libraries
- Integrated Systems
- Networking
- Servers
- Storage
- Other HPE Sites
- Support Center
- Aruba Airheads Community
- Enterprise.nxt
- HPE Dev Community
- Cloud28+ Community
- Marketplace
-
Forums
-
Blogs
-
Information
-
English
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Receive email notifications
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content
HPE Elastic Platform for Analytics: Why infrastructure matters in big data pipeline design
Are you ready to begin creating infrastructure for a big data pipeline? Before you move forward, itโs important to understand why the right design matters.
Designing infrastructure for big data analytics brings with it no shortage of challenges. Not all enterprises have the in-house expertise to design and build large scale (PB+) data lakes ready to quickly move into production. A litany of open source tools creates enormous complexity in design and integration. Most legacy data and analytics systems are ill-equipped to handle new data and workloads. Old design principles such as using core/spindle ratios are no longer a reliable guide with newer workloads.
Modern data pipelines will require extensive use of machine learning, deep learning and artificial intelligence frameworks to analyze and perform real-time predictive analytics against both structured and unstructured data. Next-generation, real and near-real time analytics requires a scalable, flexible high-performing platform.
Enterprises need to look beyond traditional commodity hardware, particularly with latency sensitive tools like Spark, Flink and Storm, along with NoSQL databases like Cassandra and HBase where low latency is mandatory. Data locality, data gravity, data temperature and the network all have to be part of the overall design. Add in data protection and data governance and you have a large number of variables to consider.
The traditional approach with Hadoop 1.0 was to use co-located compute and storage, which worked six-to-eight years ago when the focus was on batch analytics using HDFS and MapReduce. With the wave of technologies in the current Hadoop 3.0 ecosystem and beyond, co-locating compute and storage can be extremely inefficient and have negative implications on performance and scaling.
Hereโs the new reality: there is no typical or single โbig data workloadโ that you can use as a guide upon which to base design decisions. Different workloads will have different resource requirements, ranging from batch processing (a balanced design), interactive processing (more CPU), and machine learning (more GPUs). The traditional symmetric design (co-located storage and compute) leads to trapped resources and power/space constraints. You end up with multiple copies of data due to governance, security and performance concerns. The transition must be to a flexible, scalable, high performing architecture.
Address these needs with the HPE Elastic Platform for Analytics (EPA) architecture
HPE EPA is a modular infrastructure foundation designed to deliver a scalable, multi-tenant platform by enabling independent scaling of compute and storage through infrastructure building blocks that are optimized for density and running disparate workloads.
HPE EPA environments allow for the independent scaling of compute and storage and employ higher speed networking than previous generation Hadoop Clusters. They also enable consolidation and isolation of multiple workloads while sharing data, improving security and governance. In addition, workload-optimized nodes help with optimal performance and density considerations.
We recently had a customer requirement where the organization wanted to build a next-generation analytics environment for its business. Part of the challenge included changing architectural and business requirements. In particular, the initial design focused on Spark workloads and the final design focused on both Spark and Impala with critical SLAs attached to response time on Impala table scan queries.
The day-one cluster was primarily running Spark and Impala, plus services like HBase, Kudu, etc. got added over time. This is where an architecture like HPE EPA comes in handy. We were able to use purpose-built compute tiers for running Spark and Impala jobs and a separate storage tier for HDFS and Kudu. HPE EPA provided elastic scalability to grow and/or add workload specific compute and storage nodes. Here is a pictorial representation of the customer scenario and solution.
Challenges with a traditional cluster design
Solution with HPE EPA Elastic cluster
What exactly makes this architecture elastic?
HPE EPA allows the scaling of distinct nodes and resources independently which is critical with the diversity of tools and workloads in the big data ecosystem. It even allows you to change the node function on the fly (as described in the previous example). You can also add compute nodes without repartitioning the data. Containers enable rapid deployment and movement of workloads and models in line with fast data analytics requirements.
In summary, multi-tenant, elastic and scalable data lakes built on the HPE EPA architecture and a big data pipeline meet your next-generation requirements. Here is a pictorial representation.
Get more information on the HPE EPA architecture or refer to this reference architecture. Or contact your local HPE sales representative.
For suggestions on optimized hardware based on workload, check out the HPE EPA Sizing Tool.
Meet Infrastructure Insights blogger Mandar Chitale, HPE Solution Engineering Team.
Mandar has two decades of experience in the IT industry. Currently, he is a Program Manager with the HPE Solution Engineering Team which is focused on creating Solution Reference Architectures for enterprise use cases based on the traditional and emerging digital technology scape.
- Back to Blog
- Newer Article
- Older Article
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Receive email notifications
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content
- Marty Poniatowski on: Seamlessly scaling HPC and AI initiatives with HPE...
- Sabine Sauter on: 2018 AI review: A year of innovation
- Kevin Day on: Comparing HPE on-premises infrastructure vs. Amazo...
- MargaretN on: How IT decision-makers are impacting business IT a...
- JudyGoldman on: Blockchain is more than just digital money
- JudyGoldman on: Break down the IT silos that slow delivery of serv...
- KellyBaig on: IT skills gap is holding you backโand blocking inn...
- NewWithTheOld on: Legacy IT meets automation and orchestration: Part...
- Roger Mallett on: Why viewing infrastructure as code is the next log...
- paulahwa on: Hyper-convergence and Cloud: a perfect romance
-
Agility
1 -
AI
1 -
AI Software
1 -
application
1 -
Artificial Intelligence
16 -
Automation and Orchestration
97 -
blades
1 -
Blockchain
3 -
CComposable Infrastructure
1 -
CIO
20 -
CIO leadership
20 -
CIO of future
20 -
CIOs
20 -
cloud security
1 -
Cloudline
1 -
Composable Infrastructure
7 -
Compute News
3 -
Containers
1 -
ConvergedSystem 900
1 -
cybercriminals and nation states
1 -
cybersecurity
3 -
data analytics
1 -
Data Center Infrastructure
5 -
data protection
1 -
Deep Learning
1 -
Digital Transformation
1 -
Discover 2018
3 -
economic control
1 -
emerging technology
1 -
enterprise solutions
3 -
Federal IT
1 -
financial services
1 -
Flexible Capacity
4 -
GDPR
5 -
Gen 10
1 -
Gen10
11 -
Gen10 Servers
3 -
HANA
1 -
High Performance Computing
7 -
HP Infrastructure Insights
2 -
HP Insights
20 -
HP OneView
2 -
HPC
1 -
HPE Discover
1 -
HPE Pointnext
2 -
HPE Server System Restore
2 -
HPE Silicon Root of Trust
1 -
Hybrid Cloud
11 -
hybrid hpc
1 -
hybrid infrastructure
1 -
Hybrid IT
10 -
hyper-convergence
1 -
in-memory computing
1 -
Infrastructure
1 -
Infrastructure Automation
1 -
Infrastructure Security
2 -
Innovation at Work
6 -
Intelligent storage
4 -
IT Consumption Model
1 -
IT Security
10 -
IT transformation
1 -
machine learning
3 -
managing IT operations
1 -
Microsoft
1 -
Mission Critical
7 -
New Style of Business
98 -
New style of IT
35 -
Office Connect
1 -
platform architecture
1 -
ProLiant
3 -
ransomware
1 -
real time analytics
1 -
Reference Architecture
2 -
SAP
1 -
SAP HANA
1 -
secure server
1 -
Security
5 -
Server Infrastructure
1 -
server security
1 -
Silicon Root of Trust
3 -
Software Defined
2 -
solutions
19 -
StoreVirtual 3000
1 -
Synergy
10 -
the new style of business
29 -
the new style of IT
1 -
Transformation
1 -
Tribbles
1
- « Previous
- Next »
Hewlett Packard Enterprise International
- Communities
- HPE Blogs and Forum
© Copyright 2019 Hewlett Packard Enterprise Development LP