A lakehouse for data engineers

Don_Wake · ‎01-28-2022

The history of business intelligence began with data warehouses storing structured data. Then the big data revolution led to the advent of the data lake, which was a massive repository for any type of data. Both technologies served vital purposes for many companies. However, as technology and businesses have evolved to require more advanced and sophisticated uses of data, it’s become clear enterprises need to synthesize the benefits of the data lake and data warehouses. That concept has finally arrived in the form of the data lakehouse.

What is a lakehouse?

A lakehouse offers an open architecture and a new system design that combines the best features of the data warehouse and data lake.

In a lakehouse, the data warehouse paradigm is extended by the ability of the data lake to store massive scale of any type of data, while the data lake paradigm is enhanced by features of the data warehouse. These features include the ability to maintain an audit history of data as it is modified and manage the advanced data capabilities needed for data quality and compliance.

A lakehouse allows companies to store unstructured data and access that data through ACID transactions, while at the same time store virtually any other type of data at massive scale. It places similar data warehouse-like data structures and data management features on top of low-cost cloud storage in open formats, so companies can streamline their data infrastructure and make a wide variety of data available to business applications at a lower cost.

With products such as HPE Ezmeral Runtime Enterprise and HPE Ezmeral Data Fabric, companies have access to reliable and consistent data for analytics, as well as the ability to easily scale workloads with cluster orchestration based on Kubernetes. HPE Ezmeral Runtime’s hybrid deployment capabilities means customers get a managed data solution whether they are using it on premises or in the cloud, and that management extends to both Kubernetes clusters and applications.

Having learned from experience, almost nobody is trying to create one lakehouse to rule them all – one that would become the single repository for all data in a company. Rather, lakehouses will be a design construct that will sometimes be one or multiple, sometimes large, and sometimes small. In addition, businesses will need their lakehouses in many locations, not just the public cloud. Let’s examine the benefits afforded to data engineers when their tools can take advantage of the lakehouse architecture.

The benefits of a lakehouse for data engineers

A lakehouse is predicated on a powerful and robustly functioning data fabric that is the foundation of the rest of the architecture.

Unified management:

HPE Ezmeral deploys Apache Spark 3 as a Kubernetes cluster managed by HPE Ezmeral Runtime Enterprise, which offers the central control plane for applications necessary in a containerized environment. Data engineers don’t have to learn dozens of tools to get the job done. The Ezmeral Runtime Enterprise software is in essence a manager of managers. There is no need to be a Kubernetes expert or a storage administrator.

Secure private or public cloud access to applications and data:

A data engineer can get to work easily using standard tools without needing to interact with the HPE Ezmeral Runtime Enterprise platform at all. Using a secure network infrastructure and gateway, a team of data engineers will have access to their favorite data science tools via secure “service endpoints,” essentially accessing their workspace in the public or private cloud of their choice.

Self-service to grow the power of your compute cluster:

HPE Ezmeral Runtime Enterprise supports a variety of different user types or “roles” based on access controls. A data scientist could have direct access to their cluster resources or just their applications, and they could add their own apps if that was desired. The intuitive WebUI allows for point-and-click management to easily scale a compute cluster, which will be automatically tied to HPE Ezmeral Data Fabric lakehouse architecture.

Expansive data access:

The entire architecture is brought together with integrated data management within HPE Ezmeral Data Fabric that allows users to manage data anywhere from the edge to the cloud. This means all data, whether it resides in a data warehouse or data lake, is available for use. This offers key benefits for data engineers. The HPE Ezmeral Runtime Enterprise DataTap feature allows a data engineer to literally “tap into” their existing data lakes and avoid migrating or copying data. This means much of the background plumbing to make the data ready for use by a data engineer is handled by the architecture, so the engineer can immediately get to work.

Improved productivity:

Ultimately, this architecture significantly bolsters the work of data engineers by giving them a toolkit to work where and how they want, with data from any repository, from the edge to the cloud. With a lakehouse, they can access the benefits of both data lakes and data warehouses, without the limitations of each of those repositories.

A lakehouse that is deployable anywhere from the edge to the cloud offers engineers the flexibility they need with access to all an enterprise’s data. With all these advantages, a lakehouse can radiate value by being that “custom home” built to meet the specific needs of the engineer. For a demonstration of how HPE Ezmeral accelerates analytics by leveraging the latest features of Apache Spark with Delta Lake, watch the video, A House on the Lake.

Don

Hewlett Packard Enterprise
twitter.com/HPE_Ezmeral
linkedin.com/showcase/hpe-ezmeral
hpe.com/software

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

A lakehouse for data engineers

Don_Wake