AI Unlocked
1777255 Members
2555 Online
109066 Solutions
New Article
Ellen_Friedman

The case for radical simplicity in data infrastructure: A new technical paper

HPE-Data-Fabric-Radical-Simplicity.pngYou’ve heard the expression, “If it ain’t broke, don’t fix it!” 

There’s some wisdom in that idea. But in reality, “broke” shouldn’t just mean “not working at all”. In the competitive and fast-paced world of modern businesses using large-scale data to drive key applications, “broke” may mean “not working well enough to stay apace without excess effort and impractical costs.” In that case, a new solution is very much needed.

Why is a new data infrastructure needed?

Data sets continue to expand, and businesses across almost every sector now widely recognize the potential value to be extracted from large-scale data using an ever-growing variety of techniques and tools. These include approaches for analytics and AI/machine learning projects as well as essential business processes. 

To meet these needs, many organizations cobble together a complicated collection of point solutions for data storage, access, and management to match the diverse requirements of different applications. Moment-to-moment, this point-solution approach may work but it usually introduces unnecessary complexity, increasing costs. It also may lead to having to hire specialized IT teams to manage the many data systems involved. Architectures originally designed for efficiency become sprawling and cumbersome.

These problems not only make it difficult to meet SLAs under current conditions, they also make it difficult to scale the system or try new techniques without having to re-architect or build separate systems from scratch. 

A new data storage and management system is needed with these key characteristics. It should be designed and engineered to provide a unifying foundational data system that reduces the burden on IT. The system should also be more cost-effective. And lastly, it should meet SLAs reliability and yet provide a large degree of flexibility in your choice of the tools and applications that will use that data.

Cutting through Complexity: HPE Ezmeral Data Fabric

A new technical paper, “HPE Ezmeral Data Fabric: Modern infrastructure for data storage and management”, explores the selection criteria for a unifying data infrastructure if it is to make a substantial improvement to the complexity and expense of typical, large-scale systems. The paper briefly explains what a data fabric is and details how this technology handles data storage, data management, and data movement across an enterprise. The paper goes on to describe how, as a unifying data layer, HPE Ezmeral Data Fabric  supports a wide variety of both modern and legacy applications, all on the same system.  HPE Ezmeral Data Fabric is a software-defined solution that is hardware agnostic. 

The various implications of adopting data fabric as a unifying data foundation are described throughout the technical paper, but here I focus on one of the most important: cutting through unwanted complexity in large-scale systems. 

The following pair of figures is based on a typical use case from the financial sector.

They demonstrate the huge impact HPE Ezmeral Data Fabric has: it greatly simplifies workflows and streamlines architecture, while providing flexibility to use many different analytics and AI approaches. For the non-data-fabric example shown in Figure 1, each of the shaded blocks is built on a different system or technology in order to satisfy different technical constraints on particular steps in the workflow.  Notice how many steps require copying large data sets between systems, a wasted effort that should not be necessary. 

Figure 1: Non-data-fabric system built from point solutions for data: complex workflow and architectureFigure 1: Non-data-fabric system built from point solutions for data: complex workflow and architectureFor the example shown in Figure 1, a variety of systems, such as local files, an NFS farm, HDFS and more, were used to store and move data in this non-data-fabric system. In modern terms, this collection of point solutions for data introduces enormous and unnecessary complexity. Managing so many different systems imposes an undue burden that consumes much of an IT team’s workload.  

Now contrast that situation with how the use case would change if HPE Ezmeral Data Fabric were used as the unifying data layer, as shown in Figure 2 below.

Figure 2. Data fabric as unifying infrastructure: workflow is simplified and architecture is streamlinedFigure 2. Data fabric as unifying infrastructure: workflow is simplified and architecture is streamlined

With HPE Ezmeral Data Fabric, the workflow for this example can be supported by a single system (shown in Figure 2 as a single shaded block). Many steps in workflows (such as that depicted in the first figure) just drop out. Data transformation steps remain, but it is no longer necessary to copy data from system to system. Data fabric can meet the technical requirements previously addressed with a collection of point solutions. 

How HPE Ezmeral Data Fabric makes this difference

The big contrast between workflow and architecture shown in Figure 2 (data fabric) versus Figure 1 (non-data-fabric infrastructure) results from a combination of features of the data fabric. Multi-API access means diverse applications can directly access data stored in the same data layer, as illustrated by Figure 3. 

Figure 3. Data fabric’s multi-API data access: modern and legacy applications can run on the same system.Figure 3. Data fabric’s multi-API data access: modern and legacy applications can run on the same system.

Furthermore, compute resources can be moved flexibly to different tasks as needed because any machine can access any data stored in the HPE Ezmeral Data Fabric via the same pathnames. Thanks to built-in capabilities, HPE Ezmeral Data Fabric can manage the data life cycle from raw data through masking, processing, and eventually to placement in cold storage -- without any change to the way applications see the data. IT saves time and effort by managing global data through a single system that can scale up or down to support different applications.

To find out more about how HPE Ezmeral Data Fabric handles data storage, management and motion from edge to cloud, get a copy of the technical paper  “HPE Ezmeral Data Fabric: Modern infrastructure for data storage and management”.  

For an additional discussion of how to meet the challenge of point solutions, watch the video interview “How to Solve the Siloed Data Challenge” with Ronald Van Loon, Ted Dunning and me.

 

Ellen Friedman

Hewlett Packard Enterprise

www.hpe.com/containerplatform

www.hpe.com/mlops

www.hpe.com/datafabric

 

 

About the Author

Ellen_Friedman

Ellen Friedman is a principal technologist at HPE focused on large-scale data analytics and machine learning. Ellen worked at MapR Technologies for seven years prior to her current role at HPE, where she was a committer for the Apache Drill and Apache Mahout open source projects. She is a co-author of multiple books published by O’Reilly Media, including AI & Analytics in Production, Machine Learning Logistics, and the Practical Machine Learning series.