Advantage EX
AndreaFabrizi1

HPE data store solutions for AI and advanced analytics: Choosing the right platform

In part 1 of this data store blog series, the focus is on learning how to optimize data pipeline architectures to get more value from analytic initiatives with HPE scale-out data platforms.

HPE-AI-data store-blog.png

The idea that performance should be the only key criteria for choosing data store solutions for artificial intelligence (AI) is a common point of view. The rationale is straightforward: GPU-based computational clusters, especially those used for model training are extremely data “hungry”, so organizations need high-throughput data platforms to “feed” them. Performance is certainly a key criteria, but there are other factors that need to be considered in selecting an data store architecture for AI because, as the saying goes, the devil is in the details.

To start the blog series, let’s take a more exhaustive view on how to choose the right AI data platform and how HPE scale-out data solutions can help you.

How to choose the right AI data platform

Let’s begin by looking beyond just performance to analyze other key requirements for AI data stores—and evaluate how they are related to your organization’s AI maturity level.

Different types of AI/ML/DL and analytics

First, consider advanced analytics and AI models. As it can be seen in Figure 1 below, not all analytics and AI models are equal. Depending on the type of machine learning (ML) algorithm employed, the needs in terms of data store and compute resources can change significantly.

 Figure 1Figure 1

Traditional analytics and data preparation (keep in mind that data preparation is an integral part of in the AI model process) require random and sequential I/O, have a balanced read-and-write pattern, use multiple types of data from different data sources, manage files of any size (small, large, huge), and are not too sensitive to latency and throughput requirements.

Moving toward the deep learning (DL) and neural networks space, the requirements change. These algorithms mainly have sequential I/O patterns, used on single type of data in small files with a need for low latency and high-throughput solutions.

Similarly, from the compute point of view, data analytics algorithms don’t benefit from GPU characteristics and speed, and they run better on traditional CPUs. On the flip side, DL and neural network models have amazing performance boost when running on GPUs.

Dataset volumes and data throughput

Another important aspect to consider is the size of the data set. While it remains true that more data is better, not all organizations have PBs of data. The size of the datasets used in your organization will play an important role in your data store strategy. 

If training datasets are generally small enough to fit in the local storage of a GPU-based server, your data platform strategy should aim at optimizing the internal storage of the GPU systems, such as. a single HPE Apollo 6500 Gen10 system can have up to 270 TB of SSD raw capacity.

In those cases where the training datasets doesn’t fit in the local AI server internal storage, you will need to create a data pipeline capable of feeding several GPU and CPU-based systems in parallel. The challenge in this case is to provide enough bandwidth to the GPU-based servers, as a single GPU can easily consume from 1.0 up to 12GB/s depending on the model trained.

The data pipeline architecture will depend on the AI needs required by the organization and by the throughput needed to “feed” all your ML models. This means that data pipeline architecture complexity can range from a single, fast, multi-purpose file system based on software-defined storage (SDS) to a combination of hyper-fast parallel file system on a scale-out data platform(s). For reference, some of these architectures are shown in Figure 2 below.

Figure 2Figure 2

 Graphical administration interface

The importance of an easy to use graphical UI is often an underestimated requirement. This is especially critical for small and medium IT enterprise departments that generally ask for intuitive management console to support daily operations, tuning, configuration, and general maintenance as must-have elements. Managing a high-performance data store platform can be extremely complex and time-consuming activity that requires highly skilled personal. Not all IT organizations can easily put such expertise into place.

Hybrid cloud

With more than 87% of enterprises having hybrid (cloud and on-premises) models in place, the capability to seamlessly run both on-premises and on public cloud is another key decision criteria when considering a software-defined-storage (SDS) solution. The ability to orchestrate huge amounts of data from on-premises to cloud and vice versa allows organizations to use on-premises data for different purposes, including performance, security, and compliance and more, while also leveraging cloud archive services.

Metadata scaling

As each node of a training cluster may query metadata independently, metadata access performance must scale linearly with the size of the file system. This requirement becomes even more important as your training processes become more complex and your training data sets become bigger and uses more parallel computation. Having more computers running your model in parallel, means that your metadata system must manage more request in parallel too. So, metadata access performance must scale consequentially to avoid bottlenecks.

High security levels and multi-tenant support

Security is always a must. Integrated data encryption, key management and role-based access controls protect organizations from data breach and make them compliant with data privacy legislations. Moreover, as soon your company starts to implement MLOps, you will need the capability to logically isolate groups of users, departments, datasets, applications, or even different companies.

Learn how to optimize your analytics initiative with HPE scale-out data platforms

HPE has a long tradition and extensive global experience in developing advanced analytics and AI/ML/DL data pipelines architectures.

To simplify the creation of data pipeline architectures for Advanced Analytics and AI, in 2016 HPE introduced the HPE Elastic Platform for Analytics (EPA). HPE EPA is a modular infrastructure foundation to address the need for a modular scalable multi-tenant platform supporting different workloads. HPE EPA architecture allows its users to combine heterogeneous compute and storage blocks (such as AI optimization, high-storage density, optimized memory, optimized latency for real-time analytics, and standard compute) to create pipeline architectures, thus optimizing each node profile based on specific workload requirements.

As workload profiles change, you can add the necessary compute and/or storage options to each “building block” and related software to meet the new node profile needs. All these building blocks can operate within the same cluster and share the same data.

With this in mind, HPE leveraged the broadest and most expansive portfolio of servers in the industry to meet enterprise-level needs for big data analytics and AI/ML/DL workloads. This includes three families of Intel-based, density-optimized servers:

  • The HPE Apollo 4000 systems are specifically optimized and purpose-built to serve low-latency data storage-centric workloads, such as big data analytics and software-defined storage, as well as orchestration for an end-to-end data pipeline.
  • The HPE Apollo 2600 systems are specifically optimized to provide flexibility of both CPU and GPU compute nodes in the same chassis, offering compute node density to support big data analytics and AI/ML/DL models.
  • The HPE Apollo 6500 systems are the ideal HPC and deep learning platform providing unprecedented performance with industry-leading GPUs, fast GPU interconnect, high bandwidth fabric, and a configurable GPU topology to match your workloads.

A choice in building blocks for analytics architecture

On the software side, HPE had partnered with key scale-out data platforms providers (ISVs) to integrate and optimize their solutions with HPE Apollo 4000 family. Together with these strategic partners, HPE has purpose-built scale-out solutions on top of Apollo 4000 systems to make it simple for enterprises to deploy AI-driven applications and data pipelines into production on one architecture.

To explore this in more detail, take a look at how our strategic software partners have been integrated with HPE hardware to create highly effective and modular building blocks to deploy analytics architectures.

Weka Matrix—the superfast parallel filesystem for AI

Weka Matrix is one of the fastest parallel file systems for AI and technical compute workloads available on the market. Weka Matrix combines its MatrixFS flash-optimized parallel file system with  the HPE Apollo 4200 Gen10 and DL360 Gen 10 servers NVMe-based flash storage to create a high-performance, scale-out parallel storage system that is well suited for Deep Learning use cases and I/O-bound use cases. Additionally, MatrixFS was purpose-built with distributed data and metadata support to avoid hotspots or bottlenecks encountered by traditional scale-out storage solutions, exceeding the performance capabilities of even local NVMe storage. It supports distributed data protection (MatrixDDP) for data resiliency with minimal overhead and reliability that increases as the storage cluster scales.

HPE Data Node for AI—the cost-effective solution to manage both throughput-intensive and huge data volumes workloads

HPE in partnership with Weka and Scality, has created the HPE Data Node for AI, an data store solution tailored for both throughput-intensive and huge data volume analytics workloads, based on the HPE Apollo 4200 Gen10 storage server or ProLiant DL360 Gen10 All-flash server. With this solution, customers can have high-performance, petabyte-scale storage solutions with integrated data lifecycle management, providing tiering management by file system and a single namespace. This solution can be implemented in these ways;

  • The classic two-tier solution has one tier dedicated to high-performance flash, with a second tier providing scalable object storage, typically as two separate clusters of storage servers
  • Alternatively, the AI data node can be deployed with both tier elements into a single scalable cluster, optimized for both NVMe flash capacity and scale-out bulk data storage on a single Apollo 4200 Gen10.

Qumulo—multipurpose fast NAS for AI

The Qumulo scale-out file storage, together with HPE Apollo 4000 family servers, is a modern scale-out NAS with advanced block-level erasure coding and up-to-the-minute analytics for actionable data management. It provides a large-scale, cloud-native, enterprise proven, highly scalable and multipurpose filesystem for both streaming analytics and AI training as traditional workloads. This makes Qumulo an ideal choice for enterprises looking for a multipurpose scalable filesystem capable to manage traditional workloads and also “feed” AI training servers.

Qumulo file system can be deployed in minutes and it has a nice UI which provide real-time visibility for instant control and easy tuning.

Scality Ring—the perfect petabyte-volume secondary storage for AI

The Scality RING scalable object storage with HPE Apollo 4000 platform delivers massively scalable, low cost, reliable, flexible, centralized management multi-cloud data platform that your organization needs for large-scale unstructured data.

HPE Apollo 4000 Systems with Scality RING Scalable Storage creates a solution with a lower TCO than traditional SAN and NAS storage vendors at a petabyte scale. It allows you to build one data pool that can hold virtually unlimited amounts of unstructured data, which is always protected, always online, and accessible from anywhere. Indeed, Scality Ring provides up to 11 9’s durability. You can achieve the simplicity and agility of cloud with the benefits of a density-optimized, on-prem platform designed for storage-centric workloads.

The data store you need to optimize any AI workload

HPE offers a wide set of integrated and optimized hardware and software data solutions to optimize any AI workloads. HPE data solutions offer optimized building blocks that can be combined to create the optimal data pipeline architecture to support your organization’s current and future AI and advanced analytics initiatives—all with a maximum level of flexibility and modularity.

I’ll close with Figure 3 below that shows how to use HPE building blocks to create different pipelines, based on your organization's AI maturity level. These pipelines can be expanded as your AI needs evolve. New hardware and/or software blocks can be added to the existing data pipeline to meet the needs and growing data volumes.

HPE data stores-systems solutions-3.png

Stay tuned for more blogs in this series for further discussions on the HP data store solution for AI.

Learn more now


Andrea Fabrizi
Hewlett Packard Enterprise

twitter.com/HPE_Storage
linkedin.com/showcase/hpestorage/
hpe.com/storage

twitter.com/hpe_hpc
linkedin.com/showcase/hpe-ai/
hpe.com/info/hpc

0 Kudos
About the Author

AndreaFabrizi1

Andrea is Senior Product Manager for Big Data and Analytics Solutions at HPE.

Events
Starting June 22
HPE Discover 2021
THE FUTURE IS EDGE TO CLOUD Prepare for the next wave of digital transformation. Join our global virtual event. June 22 – 24
Read more
HPE Webinars
Find out about the latest live broadcasts and on-demand webinars
Read more
View all