AI Unlocked
1822562 Members
3242 Online
109642 Solutions
New Article
Ellen_Friedman

Accountable entities act as guardrails for large-scale analytic systems

A practical way to track space usage at large-scale: data fabric accountable entities

HPE Ezmeral Accountable Entities b.png

Simplicity is powerful; something every business should seek, especially when managing resources. Yet, effective management often requires more than just a gate that is either open or closed.

For example, consider the daily limits most banks place on ATM withdrawals from each account. The account holder’s access via the ATM is not all-or-nothing but instead is more nuanced. One advantage of such limits is to guard against an evil actor fraudulently draining an account of all funds. Even if the account is breached at the ATM point-of-service, the amount of damage is contained to a small sum that day, giving the account holder and the bank time to discover the problem and take action to fix it. 

An analogous situation applies in large-scale data systems. It’s important to have a way to guard against one of many applications or users in a project accidentally overrunning all the resources allotted to that project. But in a large and complex system, it can be challenging to keep track and be vigilant against an overrun. That problem is just one aspect of what is needed to effectively manage resource allocation and account for space usage in complex systems. 

Challenges of accounting for space usage in a large-scale system

Efficiently managing user allocations in large multi-user, multi-application enterprises can be a challenge. You need to assign limits per user for storage, track how much total data is being stored, and accurately charge each project for usage. Now imagine doing all of this in a dynamic environment as users and projects change.

To do all this without putting an excessive burden on IT, you must have a simplified method to account for usage that allows fine-tuned resource allocation while guarding against runaway consumption. The solution lies in a new concept in data management known as accountable entities – a solution that can act as guardrails for large, analytic systems.  

What are accountable entities vs data fabric volumes?

Accountable entities are an innovative feature of the unified data platform known as HPE Ezmeral Data Fabric. This software-defined, highly scalable data infrastructure combines files, objects, tables, and event streams into a single solution that simplifies management, security, high availability and data placement across edge-to-cloud environments.  

Within the data management capability is a concept known as accountable entities that work  with management constructs known as data fabric volumes to simplify allocation and accounting for data usage.  Data fabric volumes distribute data replicas across multiple machines in a cluster to ensure reliability. Data volumes also are the basis of many management functions for data placement, data motion via mirroring, and for setting data access policies for data stored in the data fabric. Yet, they also play a role in how accountable entities work.

Read “The easy button for data management: Data fabric volumes” to find out about the many ways data fabric volumes improve data management at scale.

Quotas for accountable entities and associated data volumes

Accountable entities provide two key advantages:

  • Guard against runaway consumption by letting you put limits on space usage for both the individual and the overall project.
  • Match fine-tuned tracking of actual space usage to intended budget specifications

Here’s how accountable entities work. When a data fabric volume is created, the owner or administrator can grant data access permissions to different users and groups and can even set volume-level data access using a predefined policy. The control of data usage, however, is independent of these access permissions or the ownership of files and directories. Each volume can be associated with an accountable entity. Such an accountable entity could be a data user or team leader, but it is more often an entire project. All of the volumes associated with a project are assigned the same accountable entity. An accounting of the space actually used by data written to each volume then rolls up to the associated accountable entity.  

HPE Ezmeral is a better way.PNG

To manage resource allocation, you can set or modify quotas that limit the maximum data storage space used by a volume or an accountable entity, as indicated in Figure 1.

HPE Ezmeral Accountable Entities 1.PNG

 Figure 1. Multiple data fabric volumes are associated with each accountable entity.

The accountable entity quota limits the maximum space used collectively by all its associated volumes. Why, then, might you also put a quota on each associated data volume rather than just a quota for the overall accountable entity? 

Guard against runaway space consumption 

To understand why the two types of quotas provide powerful management advantages, consider what happens with quotas when data is written to a data fabric volume. This scenario is illustrated in Figure 2.  A data scientist, Jane, with Project A writes data to Volume A-1. That data usage not only is charged against the volume Quota A-1, it also rolls up via the associated accountable entity to Quota A. Similarly, if Jane writes to Volume A-2, another volume assigned to the same accountable entity, that space usage counts against the quota for that particular volume (Quota A-2) but also against the quota for Accountable Entity A.HPE Ezmeral Accountable Entities 2.png Figure 2. Data written to a data fabric volume counts against both the volume quota and the accountable entity quota.

Why does having an aggregate of separate quotas make a difference? 

Suppose a project involves multiple steps in data processing and analytics. The overall project may be allotted a specific quota for the maximum space its data can take up. But different applications or processes may be writing data to different data volumes assigned to the project. If a malfunction in one application causes unexpectedly large amounts of data to be written, that malfunction could affect the ability for all the other applications to record their results.

Being able to set a quota at the aggregate as well as the individual volumes limits the risk of runaway consumption by a single application or user.  You can think of this as a way of limiting the "blast radius" of certain classes of bugs or operational errors.

Fine-tune tracking of data space usage

Another advantage of data fabric accountable entities is to provide a better, more realistic fit between how resource usage is tracked and what is intended in budget specifications. Here’s how that works.

Although people familiar with Linux or Windows file systems often expect space usage to be charged to the person or application doing the writing, data fabric does accounting in a better way. HPE Ezmeral Data Fabric ties quotas to where the data is written, not who does the writing. This matters in part because people often work on more than one project. The applications they code may write data to different projects, as shown in Figure 3. The same thing happens when you use task-specific user identities as well. These multi-user situations could complicate accounting for space usage, but data fabric cuts through the problem thanks to the way quotas work for volumes and accountable entities. To understand how accountable entities help, let’s revisit our scenario with data scientist Jane, writing to different data fabric volumes.

HPE Ezmeral Accountable Entities 3.png

Figure 3. Space usage is tracked based on where data is written, not who did the writing.

A new situation is depicted in Figure 3. Even though Jane works mostly on Project A, she might have been given permission to write to a volume that is part of another project, one that is assigned to Accountable Entity B. When she does so, the resulting space usage counts against the quota for that data fabric volume and against the quota for Accountable Entity B, even though Jane is on the team for Project A. 

HPE Ezmeral a better way 2.PNG

A better way to account for space usage

By separating quotas for data written to data fabric volumes from the aggregate quota for an accountable entity, HPE Ezmeral Data Fabric acts as a guardrail, ensuring a single component doesn’t use all the total resources.  And by having space usage tied to where data lives rather than who placed it there, data fabric offers a nuanced way to handle accounting for resource usage. With space usage charged against a volume quota and rolled up to its associated accountable entity, you have an expressive way to track actual resource usage and align it realistically with budget specifications.

Taken together, these attributes of accountable entities are one more part of how data fabric delivers excellent data management in complex distributed systems at large scale.

To explore the wide range of capabilities based on data fabric volumes, read the blog “The easy button for data management: Data fabric volumes”. 

Learn more about HPE Ezmeral Data Fabric.

Hewlett Packard Enterprise

twitter.com/HPE_Ezmeral
linkedin.com/showcase/hpe-ezmeral
hpe.com/software

 

About the Author

Ellen_Friedman

Ellen Friedman is a principal technologist at HPE focused on large-scale data analytics and machine learning. Ellen worked at MapR Technologies for seven years prior to her current role at HPE, where she was a committer for the Apache Drill and Apache Mahout open source projects. She is a co-author of multiple books published by O’Reilly Media, including AI & Analytics in Production, Machine Learning Logistics, and the Practical Machine Learning series.