Tech Insights
1755143 Members
3029 Online
108830 Solutions
New Article
AndreaFabrizi1

Why HPE EPA is the right architecture for future data center needs

Learn why the HPE Elastic Platform for Analytics (EPA) is the right architecture to migrate current siloed Hadoop architecture into a more flexible, modular architecture to support the future needs of your data center.

HPE EPA-article.png


Today, we are witnessing an amazing growth of the information. IDC predicts that expansion of the global datasphere, which is the sum of all digital data produced by humankind, will reach the astonishing value of 175 ZB in 2025.* (For the records, one zettabyte is equivalent to a million petabytes).

HPE-EPA-1.png

 IDC predicts that the global datasphere will grow from 33 Zettabytes in 2018 to 175 Zettabytes by 2025.*

The reason for this growth is that data is transforming the way we live, work, and play.

  • Companies leverage data to improve customer experiences, open new markets, make employees, and processes more productive, while creating new sources of competitive advantage.
  • People monitor activities, produces content, share information and media content on social media.
  • Machines and sensors produce data, tracking the status of equipment, environments, supply chains, foods, and more.

While data may be generated from a huge variety of sources, data will be stored in three primary locations: public cloud, company data center (on premises or core) and edge devices.

By 2025, IDC predicts that half of data will be stored in public cloud, 40% in private data center, while the remaining 10% will be stored in edge devices (including consumer’s devices as camera, phones, and computers).*

HPE-EPA-2.png

 Human Datasphere in 2025 (Source: IDC*)

Even if the role of public clouds will continue to grow, a big portion of data (75 ZB in 2025) will continue to be stored in private data centers.

By 2025, IDC predicts that 40% of the whole datasphere (about 75 ZB) will be stored in private clouds and data centers.*

Looking at this number, the natural question is how enterprises will store this enormous quantity of data? The answer is that future enterprise data centers will need to be a mix of next-generation, scale-out storage solutions (e.g. scale-out object, NAS) combined with HDFS infrastructures. Even Hadoop (and in particular HDFS technologies) will continue to play an important role in future data centers. The huge quantity of data already stored in enterprise data lakes that are on premises and the investment already made in Hadoop and HDFS technologies point to the continued need for those environments. It is not only a problem of amortizing existing investments. It is also a problem of data gravity! Moving a PB of data from one storage platform to another, minimally requires a few weeks of planning and additional weeks for deployment and integration. Moving data away from a 10 PB Hadoop cluster can easily requires a year or even more.

Due to data gravity, Hadoop and HDFS in particular will continue to play an important role in future enterprise data centers.

So, the real question is: How can IT support digital transformation leveraging new technology trends and using existing Hadoop data lakes?

Before answering this question, I’ll quickly summarize the main business challenges of any digital transformation and share customer needs in terms of technological capabilities.


HPE EPA-table.png


In summary, enterprises need the capabilities to run multi-tenant application clusters on a variety of technologies (such as GPU and hardware accelerators) to support different workloads (such as batch analytics, AI/ML training, and streaming). These application clusters need to run in the public cloud, the core, or at the edge. And they need to be able to access data on multiple Hadoop distributions and scale-out data stores, avoiding data duplication or replications.

These requirements present a significant challenge to the traditional frameworks used to support Hadoop in the majority of enterprises, where collocated compute and storage inhibit scale of either resource independently, where multi-tenancy of workloads creates scheduling and performance challenges and where specialized hardware can’t be easily added to an existing cluster.

Aware of these limitations for a long time, HPE introduced the HPE Elastic Platform for Analytics (EPA) to overcome them. HPE EPA leverages standard Hadoop and Yarn to enables enterprises eliminating the limitations of a traditional Hadoop symmetric deployment and realizing all the benefits of an asymmetrical, modular architecture.

HPE Elastic Platform for Analytics (EPA) allows to reconfigure traditional symmetric Hadoop deployment in asymmetric and modular ones to support new technological trends.

Why is the HPE Elastic Platform for Analytics the optimal platform for your workloads?

The EPA is designed as a modular infrastructure foundation to address the need for a scalable multi-tenant platform, by enabling independent scaling of compute and storage through infrastructure building blocks that are workload optimized.

HPE Elastic Platform for Analytics offers an “asymmetric” architecture and a different approach to building a data pipeline optimized for big data analytics.

  • Asymmetric architectures consist of separated scalable blocks of compute and storage resources, connected via HPE high speed network components (25G, 40G, or 100G), along with integrated management software and bundle support.
  • HPE EPA asymmetric architecture allows you to combine heterogeneous compute/storage blocks (e.g. AI-optimized, high-storage density, memory optimized, latency optimized for real-time analytics, standard compute, etc.) to optimize each workload requirements. As workload profiles change, you can add the right compute or storage blocks to satisfy the new profiles needs. All these blocks can operate within the same cluster and share the same data.
  • HPE EPA building blocks provide the flexibility to optimize each workload and access the same pool of data for batch, interactive and real-time analytics.
  • This architecture can span from edge to core and to cloud.

HPE-EPA-3.png

HPE Elastic Platform for Analytics

The benefit of an HPE elastic design is that it enables you to:

  • Independently scale compute and storage nodes as business requirements change
  • Run multiple workloads on the same cluster
  • Reduce cost and security ramifications of having multiple copies of the data on isolated workload clusters
  • Flexibility to repurpose compute nodes to support new services and application requirements from the business

The HPE EPA architecture allows the consolidation of data on isolated workload clusters on a shared and centrally managed platform, providing organizations the flexibility to scale compute and/or storage as required, using modular building blocks for maximum performance and density. HPE EPA building blocks provide the flexibility to optimize each workload and access the same pool of data for batch, interactive and real-time analytics.

HPE EPA allows organizations that have already invested in balanced Hadoop symmetric configurations to repurpose their existing deployments into a more flexible and modular asymmetric architecture capable to support multiple analytics needs, multi-tenancy, the growth compute, and/or storage capacity independently, without building new clusters.

The figure below highlights some of the different server building blocks within the HPE EPA, categorized by edge and core architectures and highlighting consumption-based service offerings from HPE. Using these blocks, you can build your infrastructure—and as workloads/compute and data storage requirements change, you can easily add new blocks to satisfy the new workload needs.

HPE EPA building blocks-4.pngExample of HPE EPA building blocks

For more information on HPE EPA building blocks, download this reference architecture: HPE Reference Configuration for Elastic Platform for Analytics (EPA)—Modular building blocks of compute and storage optimized for modern workloads

What makes migrating from traditional Hadoop to HPE EPA easy

The adoption of HPE EPA is a straightforward and easy process in which current worker nodes are transformed in storage node only and new compute nodes are added to the cluster (Note: An alternative approach to adding new compute nodes, is to convert existing worker nodes in compute ones).

This transformation is done in three main steps:

  1. Add configuration groups to YARN
  2. Add new nodes (compute)
  3. Remove unwanted services from the worker nodes (now storage nodes)

HPE-EPA-5.png

Example of how to migrate from traditional Hadoop deployment to HPE EPA

Here's an Important factor to keep in mind: HPE professional services has extensive experience in supporting customers transforming their traditional Hadoop environments with HPE EPA.

How your organization can benefit from HPE EPA

Benefits include:

  1. HPE EPA allows existing Hadoop clusters to support new technology trends such hardware accelerators, containers, scale-out software-defined-storage, and hyperconvergent infrastructures, without moving data. HPE EPA allows enterprises to easily transform current “siloed” Hadoop clusters in modular building blocks to provide the flexibility to optimize any workload and to access the same pool of data for batch, interactive, and real-time analytics.
  2. HPE EPA can be deployed everywhere: cloud, core and edge. HPE EPA compute and storage nodes can be deployed geographically distributed, in disaster recovery and on edge locations.
  3. HPE EPA supports multi-tenancy workloads and container deployment.
  4. HPE EPA leverages HPE’s long experience on HPE EPA deployments. HPE has created an extensive number of reference architectures and configuration documents to support a wide range of use cases and workloads.
  5. HPE EPA uses an HPE sizing tool to size Hadoop clusters based on applications running on it (e.g. Spark, HBase, Hive, Yarn, Tensorflow) and the compute (e.g. edge, workload-optimized, standard) and storage (e.g. high or standard storage density) building blocks.

Here are additional resources to learn more about HPE EPA:

*IDC - Data Age 2025 – The Digitalization of the World from Edge to Core


Andrea Fabrizi
Hewlett Packard Enterprise

twitter.com/HPE_Storage
linkedin.com/showcase/hpestorage/
hpe.com/storage

twitter.com/HPE_AI
linkedin.com/showcase/hpe-ai/
hpe.com/us/en/solutions/artificial-intelligence.html

0 Kudos
About the Author

AndreaFabrizi1

Andrea Fabrizi is the Strategic Portfolio Manager for Big Data and Analytics at HPE.