Advantage EX
cancel
Showing results for 
Search instead for 
Did you mean: 

Create your secondary AI data store using HPE Apollo 4000 systems and Scality RING

Learn why HPE servers and Scality Ring software is the right solution for AI secondary data stores.

HPE data storage-Scality RING-blog.png

Why does a secondary data store matter for AI?

In my previous blog in this data store series, I discussed how the real selection criteria for an AI/ML data platform is how to obtain the best balance between capacity (cost per GB stored) and performance (cost per GB of throughput). Indeed, to support enterprise AI programs, the data architecture must support both high performance (needed for Ai training and validation) and high capacity (needed to store the huge amount of data that AI training requires). Even if these two capabilities can be hosted on the same systems (integrated data platform) or in large infrastructures, they are hosted in two separated specialized systems (two-tier architecture).

This post continues the series of blogs dedicated to data stores for AI and advanced analytics. In case you missed them, here are the previous blogs: Choosing the right platform , HPE and WekaIO provide the superfast data platform you need to train AI models and The tale of different data architectures for AI/ML and analytics.

Now I'm eady to explore the high-capacity data solution and on how this component is replacing traditional secondary storage systems in modern IT.

The advent of AI, analytics, IoT, and the proliferation of unstructured data has dramatically changed the role of the secondary storage tier. It's transitioned from cost-effective data offload, protection, and backup storage to an exabyte-level, highly reliable, cost-effective, scale-out software-defined-storage data platform.

Historically, the IT data “consumption” model was very simple: Data was mainly “produced” and “consumed” within the corporate applications like CRM and ERP.

The data management paradigm was built consequently, tiering the less active data on cheaper but slower storage tiers (secondary data storage), in order to keep more capacity for newer data at the faster but expensive primary data tier. Data tiering was mainly a north/south (or up/down) data movement to a second tier and after that to the obsolescence. The data on secondary storage was rarely used and reloaded in the application primary tier to be accessed again.

AI, IoT, and the proliferation of unstructured data have dramatically changed the way data is consumed and persisted in the enterprise. Today, data production and consumption processes are separated across different applications, such as a CRM system that creates and use its data. But data is also ingested into enterprise data lakes, where it is used by analytics and AI applications to create insights on such things as customer behavior, recommendations, predict fraud, or churn.

The new data management paradigm is based upon optimizing data accessibility for AI applications. Data mobility is an east/west motion, from applications to data lake or object stores to AI or analytics-specific data stores. Data is always accessible, and it doesn’t ever age.

New and old data movement paradigmsNew and old data movement paradigms

Nowadays, the role of secondary storage is radically changed. It's gone from cost-effective data offload, protection and backup storage to an exabyte-size, highly reliable, cost-effective, scale-out software-defined-storage data platform. The secondary storage has become the main place to keep enterprise data. The place where data is always online, always available and always protected.

Why HPE and Scality Ring is the right solution for your AI secondary data store

Scality RING is a distributed scale-out object-storage layer that employs a second-generation peer-to-peer architecture. This architecture distributes both the user data and the associated metadata across the underlying nodes to eliminate the typical central metadata database bottleneck. So, Scality RING can be seamlessly scaled-out to thousands of servers with 100’s of petabytes of storage capacity. RING has no single points of failure, and requires no downtime during any upgrades, scaling, planned maintenance or unplanned system events. (For more information, register to download a detailed description of the Scality RING architecture.)

HPE and Scality have optimized and certified RING on the HPE Apollo 4000 systems: HPE Apollo 4200 Gen10 (24LFF) and Apollo 4510 Gen10 servers (60LFF). The combination of HPE Apollo 4000 systems with Scality RING software delivers an extremely cost-effective, at petabyte scale, and high-performance secondary data store solution designed for AI-centric workloads.

Now I'll discuss the key benefits of HPE Apollo 4000 Systems and Scality Ring.

Scality RING on HPE Apollo 4000 addresses the challenges of storing massive of training data for your AI projects

As larger data sets deliver better algorithms, AI data training sets tend to be massive, but you don’t need to keep all of them on fast and expensive data stores. Scality RING object storage is the right data platform to keep data that is not immediately used by training algorithms.

It scales out linearly as a single system across multiple active sites, thousands of servers, hundreds of petabytes, and unlimited objects - without adding administrators or additional disparate components and all in a single namespace. This enables massive consolidation and significantly reduces operating costs. Moreover, Scality RING price per TB is extremely affordable, even in comparison with cloud archive prices.  

Data is always online, always available, and always protected: Scality RING provides the highest levels of data durability – in excess of 14 nines (9s)

It is critical that secondary data stores can store up hundreds of petabytes of data. The reason being that the data science teams often require a lot more training data to continue making better AI. It is therefore impractical for IT to store such vast amounts of data on primary storage. This data also needs to be in active-archive or warm archive storage in order to be easily accessible for accurate model building.

Scality RING is designed for extreme fault-tolerance. It ensures that data remains durable and available in case of a wide range of component failures including disks, servers, networks in the same site and even across multiple data centers. RING provides data durability through a set of flexible data protection mechanisms optimized for distributed systems, such as replication, erasure coding, and geo-replication capabilities, providing customers up to 14 nines (9s) of data durability. These mechanisms also provide high storage efficiency as they require less raw storage. Below, you can see Scality RING minimum data durability in both single site and multiple site scenarios.

HPE data store-Scality RING 2.png

Hybrid cloud management: Balance cost using both on-premises storage and public cloud

Scality includes Zenko multi-cloud data controller software to address the oncoming need for agnostic data management across multiple hybrid environments. Zenko supports XDM (eXtended Data Management) capabilities, which extends RING data management across multiple RINGs and public clouds. Zenko provides an ideal solution for edge/IoT data storage, through a lightweight, container-based deployment model. It can be used on the edge to store, cache and replicate data back to RING or public cloud in large core data centers. Examples of hybrid cloud data management use cases supported by Zenko includes:

  • Offload old/unused RING data to an inexpensive cloud tier for long-term archive
  • Free up RING capacity while maintaining data for compliance or regulatory purposes
  • Copy of RING data (all or partial) in cloud for disaster recovery
  • Move RING data to cloud for use with services such as compute bursting, and analysis
  • Repatriate or copy cloud data to use it in-prem (avoid data traffic from cloud)

Simpler operations and management

Scality RING has an intuitive graphical user interface called RING health to manage and control the whole infrastructure. Through the UI, users can manage:

  • Storage provisioning
  • Connector control
  • Hardware health monitoring

RING health now includes platform-specific reporting of HDD health from HPE servers via the HPE Smart Array controllers.

The versatility of interfaces available with Scality RING

The versatility of Scality RING goes beyond objects, as it supports different kinds of interfaces, including NFS and SMB (CIFS), making IT life easier. Supporting a wide array of interface protocols alleviates the need for rewriting AI applications. Most existing applications can be effortlessly used to build AI for the enterprise. The following table provides the list of protocols and connectors supported by Scality RING.

HPE data store-Scality RING-table.png

The advantages of HPE Apollo family servers and Scality RING software

The combination of Scality RING features with HPE server optimization delivers the optimal solution for an AI secondary data store.

Together, HPE servers and Scality RING software deliver the optimal solution for an AI secondary data store. This allows enterprises to keep up with the most demanding AI workloads while providing a full-fledged, easy-to-manage storage solution.

Stay tuned to this blog series to learn more about HPE data store solutions for AI and advanced analytics.

 Additonal resources


Andrea Fabrizi
Hewlett Packard Enterprise

twitter.com/HPE_Storage
linkedin.com/showcase/hpestorage/
hpe.com/storage

twitter.com/hpe_hpc
linkedin.com/showcase/hpe-ai/
hpe.com/info/hpc

0 Kudos
About the Author

AndreaFabrizi1

Andrea is Senior Product Manager for Big Data and Analytics Solutions at HPE.

Events
Starting June 23
HPE Discover Virtual Experience
Joins us for HPE Discover Virtual Experience live and on-demand
Read more
Online Expert Days - 2020
Visit this forum and get the schedules for online Expert Days where you can talk to HPE product experts, R&D and support team members and get answers...
Read more
View all