- Community Home
- >
- Storage
- >
- Around the Storage Block
- >
- What is the optimal infrastructure for a modern da...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Receive email notifications
- Printer Friendly Page
- Report Inappropriate Content
What is the optimal infrastructure for a modern data lake?
The proliferation of new workloads have introduced novel requirements beyond the need for petabyte level capacity. Continuous event streams require greater throughput, plus more demanding and complex workflows, along with the ability to persist data at different points in a data pipeline. While general purpose servers are architected for versatility, theyโre not optimized for data-centric workloads. When you scale out your data lake with a converged or symmetric architecture, you may be adding compute resources you donโt really need, and ultimately theyโre not as cost effective.
I recently had to remove a large, dead tree from my back yard, and while doing so I ended up injuring my shoulder. That led to an MRI at a local medical imaging clinic. A few days later they let me know they had the results, so I inquired if my orthopedic doctor had received the images. When they asked if I wanted them to fax over the report, I was at a loss for words.
Fax it over, really?
It reminded me that even sophisticated users of technology sometimes retain old technology far too long.
There are plenty of other examples of this sort of thing. โ This isnโt just limited to Healthcare. Enterprises will find a product or architecture that gets a job done. Then, things change, technology improves, but the general attitude often seems to be if it isnโt broken, you donโt need to fix it. Unfortunately we donโt always clearly identify when something might be broken, or when we continue to use something even if might have become inefficient, or an inhibitor to using something far better.
Whatโs wrong with using a general purpose, racked mounted server for your modern data lake?
Iโve seen this dynamic come into play as our customers are increasingly looking to leverage the value of their data, and how to build the infrastructure to support that effort. Historically, Hadoop data lakes relied on general purpose rack mounted servers, with converged compute and storage. These servers are designed for maximum versatility, which enables them to address a variety of workloads, and are still commonly used as a sort of cookie-cutter approach to building a data lake. But the requirements driving the modern data lake have changed dramatically over the past few years. Data lakes are at a transition point for enterprises going through digital transformation.
The proliferation of new workloads have introduced novel requirements beyond just the need for petabyte level capacity. Continuous event streams require greater throughput, plus more demanding and complex workflows, along with the ability to persist data at different points in a data pipeline.
Artificial Intelligence and machine learning (AI/ML) also requires throughput for model training, as well as low latency for inferencing. The modern data lake is now part of a much larger intelligent data pipeline that must comprehend the Edge and IoT; it needs to support hybrid cloud, accommodate AI/ML workloads and real time streaming, ultimately enabling a data-driven strategy that will unleash the power of the data.
While general purpose servers are architected for versatility, theyโre not as optimized for data-centric workloads. When you scale out your data lake with a converged or symmetric architecture, you may be adding compute resources you donโt really need, so ultimately theyโre not as cost effective. Traditional, general purpose rack mount servers generally offer more limited storage density, which can translate to more server nodes required to hit a required storage capacity. That translates to higher total cost of ownership, as well as the potential for higher software licensing costs.
Balanced, high throughput for data-centric workloads: The HPE Apollo 4000 family
HPE offers an asymmetric, extremely flexible architecture where storage and compute nodes are separated, can scale independently and can be configured based on the requirements of specific workloads. That sounds pretty simple, but you canโt effectively accomplish that with just any off the shelf, general purpose server.
What distinguishes the HPE Apollo 4000 family of intelligent data storage servers from other competitive servers is that theyโre designed for data centric workloads, offering a balanced, high throughput system architecture from the front end I/O all the way to the back end data persistence, with a broad set of tiered storage options to choose from. These options allow you to create the right node profile that is optimized for the workload. This extensive focus on data centric workloads also extends to the user experience, from the intuitive layout of the drive bays to the ease of rack serviceability for all media.
The new HPE Apollo 4200 Gen10 Plus data storage server is the newest member of the Apollo 4000 family and was just launched in June 2021. Its built to accommodate both ends of the data-centric workload spectrum, combining an ultra-dense, cost-saving HDD bulk capacity that accommodates deeper data lakes and archives, complemented by high performance NVMe flash, persistent memory, and accelerators that deliver the high throughput and low latency required for in-place analytics and AI/ML, support of NoSQL DBโs, and cache-intensive workloads. In the future, it will also provide support for select GPU and FPGA accelerator options.
See our latest video on the use of the HPE Apollo 4200 Gen10 Plus as a building block for your modern data lake.
Meet new HPE blogger Donald Wilson. Donald is an enterprise infrastructure leader and senior business development manager, with over 20 years of alliance, product, and solution management experience. Most recently, his focus has been on constructing modern data lakes and intelligent data pipelines, optimized to enable workloads for advanced analytics and AI/ML. Throughout his career, Donald has cultivated expertise in creating new go-to-market strategies and value propositions for emerging market segments, as well as closing new business in high-growth markets, across North America, Europe, and Asia.
You can connect with Donald on LinkedIn
.
Storage Experts
Hewlett Packard Enterprise
twitter.com/HPE_Storage
linkedin.com/showcase/hpestorage/
hpe.com/storage
- Back to Blog
- Newer Article
- Older Article
- haniff on: High-performance, low-latency networks for edge an...
- StorageExperts on: Configure vSphere Metro Storage Cluster with HPE N...
- haniff on: Need for speed and efficiency from high performanc...
- haniff on: Efficient networking for HPEโs Alletra cloud-nativ...
- CalvinZito on: Whatโs new in HPE SimpliVity 4.1.0
- MichaelMattsson on: HPE CSI Driver for Kubernetes v1.4.0 with expanded...
- StorageExperts on: HPE Nimble Storage dHCI Intelligent 1-Click Update...
- ORielly on: Power Loss at the Edge? Protect Your Data with New...
- viraj h on: HPE Primera Storage celebrates one year!
- Ron Dharma on: Introducing Language Bindings for HPE SimpliVity R...