- Community Home
- >
- HPE AI
- >
- AI Unlocked
- >
- A lakehouse for data engineers
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Receive email notifications
- Printer Friendly Page
- Report Inappropriate Content
A lakehouse for data engineers
The history of business intelligence began with data warehouses storing structured data. Then the big data revolution led to the advent of the data lake, which was a massive repository for any type of data. Both technologies served vital purposes for many companies. However, as technology and businesses have evolved to require more advanced and sophisticated uses of data, it’s become clear enterprises need to synthesize the benefits of the data lake and data warehouses. That concept has finally arrived in the form of the data lakehouse.
What is a lakehouse?
A lakehouse offers an open architecture and a new system design that combines the best features of the data warehouse and data lake.
In a lakehouse, the data warehouse paradigm is extended by the ability of the data lake to store massive scale of any type of data, while the data lake paradigm is enhanced by features of the data warehouse. These features include the ability to maintain an audit history of data as it is modified and manage the advanced data capabilities needed for data quality and compliance.
A lakehouse allows companies to store unstructured data and access that data through ACID transactions, while at the same time store virtually any other type of data at massive scale. It places similar data warehouse-like data structures and data management features on top of low-cost cloud storage in open formats, so companies can streamline their data infrastructure and make a wide variety of data available to business applications at a lower cost.
With products such as HPE Ezmeral Runtime Enterprise and HPE Ezmeral Data Fabric, companies have access to reliable and consistent data for analytics, as well as the ability to easily scale workloads with cluster orchestration based on Kubernetes. HPE Ezmeral Runtime’s hybrid deployment capabilities means customers get a managed data solution whether they are using it on premises or in the cloud, and that management extends to both Kubernetes clusters and applications.
Having learned from experience, almost nobody is trying to create one lakehouse to rule them all – one that would become the single repository for all data in a company. Rather, lakehouses will be a design construct that will sometimes be one or multiple, sometimes large, and sometimes small. In addition, businesses will need their lakehouses in many locations, not just the public cloud. Let’s examine the benefits afforded to data engineers when their tools can take advantage of the lakehouse architecture.
The benefits of a lakehouse for data engineers
A lakehouse is predicated on a powerful and robustly functioning data fabric that is the foundation of the rest of the architecture.
- Unified management:
HPE Ezmeral deploys Apache Spark 3 as a Kubernetes cluster managed by HPE Ezmeral Runtime Enterprise, which offers the central control plane for applications necessary in a containerized environment. Data engineers don’t have to learn dozens of tools to get the job done. The Ezmeral Runtime Enterprise software is in essence a manager of managers. There is no need to be a Kubernetes expert or a storage administrator.
- Secure private or public cloud access to applications and data:
A data engineer can get to work easily using standard tools without needing to interact with the HPE Ezmeral Runtime Enterprise platform at all. Using a secure network infrastructure and gateway, a team of data engineers will have access to their favorite data science tools via secure “service endpoints,” essentially accessing their workspace in the public or private cloud of their choice.
- Self-service to grow the power of your compute cluster:
HPE Ezmeral Runtime Enterprise supports a variety of different user types or “roles” based on access controls. A data scientist could have direct access to their cluster resources or just their applications, and they could add their own apps if that was desired. The intuitive WebUI allows for point-and-click management to easily scale a compute cluster, which will be automatically tied to HPE Ezmeral Data Fabric lakehouse architecture.
- Expansive data access:
The entire architecture is brought together with integrated data management within HPE Ezmeral Data Fabric that allows users to manage data anywhere from the edge to the cloud. This means all data, whether it resides in a data warehouse or data lake, is available for use. This offers key benefits for data engineers. The HPE Ezmeral Runtime Enterprise DataTap feature allows a data engineer to literally “tap into” their existing data lakes and avoid migrating or copying data. This means much of the background plumbing to make the data ready for use by a data engineer is handled by the architecture, so the engineer can immediately get to work.
- Improved productivity:
Ultimately, this architecture significantly bolsters the work of data engineers by giving them a toolkit to work where and how they want, with data from any repository, from the edge to the cloud. With a lakehouse, they can access the benefits of both data lakes and data warehouses, without the limitations of each of those repositories.
A lakehouse that is deployable anywhere from the edge to the cloud offers engineers the flexibility they need with access to all an enterprise’s data. With all these advantages, a lakehouse can radiate value by being that “custom home” built to meet the specific needs of the engineer. For a demonstration of how HPE Ezmeral accelerates analytics by leveraging the latest features of Apache Spark with Delta Lake, watch the video, A House on the Lake.
Don
Hewlett Packard Enterprise
twitter.com/HPE_Ezmeral
linkedin.com/showcase/hpe-ezmeral
hpe.com/software
Don_Wake
Don has spent the past 20 years building, testing, marketing, and selling enterprise storage, networking, and compute solutions in the rapidly evolving information technology industry. Today he is focused on the HPE Ezmeral Container Platform. The HPE Ezmeral Container Platform is exciting to work on as it offers the ultimate toolkit to manage, deploy, execute, and monitor data-centric applications on software- and hardware-based architectures in the cloud, on premises, and at the edge.
- Back to Blog
- Newer Article
- Older Article
- Dhoni on: HPE teams with NVIDIA to scale NVIDIA NIM Agent Bl...
- SFERRY on: What is machine learning?
- MTiempos on: HPE Ezmeral Container Platform is now HPE Ezmeral ...
- Arda Acar on: Analytic model deployment too slow? Accelerate dat...
- Jeroen_Kleen on: Introducing HPE Ezmeral Container Platform 5.1
- LWhitehouse on: Catch the next wave of HPE Discover Virtual Experi...
- jnewtonhp on: Bringing Trusted Computing to the Cloud
- Marty Poniatowski on: Leverage containers to maintain business continuit...
- Data Science training in hyderabad on: How to accelerate model training and improve data ...
- vanphongpham1 on: More enterprises are using containers; here’s why.