Advantage EX
UliPlechschmidt

The 5 new superlatives of AI storage

The rapid adoption of AI in organizations of all sizes breaks legacy storage architectures architecturally and/or economically. What worked for AI POCs no longer works in large-scale production. Learn about the new records HPE is setting when it comes to AI storage.

Data is the life blood of artificial intelligence (AI) and deep learning (DL). Vast quantities of unstructured training dataHPE-HPC-AI-STORAGE-BLOG.jpg enhance accuracy in the search for potentially predictive relationships running on GPU-accelerated compute infrastructure.

Here are five specific examples in five different categories where new high or low watermarks are being set when it comes to AI storage attached to GPU-accelerated compute.

  1. The largest hybrid parallel file system
  2. The largest all-flash parallel file system
  3. The fastest restore capability for parallel storage
  4. The longest-serving large-scale parallel file system
  5. The smallest parallel file system

The Oak Ridge Leadership Computing Facility (OLCF), a U.S. Department of Energy high-performance computing user facility, recently announced the specifications of its new Orion file system. Among other systems at OLCF, Orion will support HPE-AI storage-690 petabyes-blog.pngthe upcoming Frontier exascale supercomputer that will feature four AMD GPUs for each AMD CPU. Orion is based on Cray ClusterStor E1000 and as a hybrid file system features three storage tiers:

  • Flash-based performance tier of 5,400 nonvolatile memory express (NVMe) drives providing 11.5 petabytes (PB) of capacity at peak read-write speeds of 10 TB/second
  • Hard-disk-based capacity tier of 47,700 perpendicular magnetic recording drives providing 679 PB of capacity at peak read speeds of 5.5 TB/second and peak write speeds of 4.6 TB/second
  • Flash-based metadata tier of 480 NVMe devices providing an additional capacity of 10 PB.

This represents the new high watermark for large high performance file systems.

When it comes to all-flash file systems the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory (Berkeley Lab) is setting the bar. Its next-generation supercomputer, Perlmutter, includes an all-flash file system with 35 petabytes (PB) usable capacity based on Cray ClusterStor E1000.HPE-AI-storage-35 petabytes-blog.png

This all-flash file system will provide very high-bandwidth storage to the HPE Cray supercomputer that in phase one features compute nodes with four NVIDIA GPUs per AMD CPU. But new records are also set outside of the classic supercomputing leadership sites where the confluence of classic simulation with artificial intelligence (AI) is changing advanced computing as we know it.

A good example is the collaboration of Zenseact and HPE to develop next generation autonomous driving cars – on an end-HPE-AI-storage-restore speed-blog.pngto-end infrastructure that is delivered with HPE GreenLake as-a-service.

The solution at zensact requires the ability protect and restore data at very high (record) speeds in order to hit the business-critical simulation window should a restore of the data be necessary. HPE Data Management Framework (DMF) running on HPE ProLiant DL rack servers could meet the requirement of restoring petabytes of data with about 200 gigabytes per second.

The “longest serving large-scale AI storage award” goes to the ClusterStor storage system of the Blue Waters supercomputer at the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign.HPE-AI-storage-Blue Waters-blog.png

So far, it’s served data for more than 38 billion core-hours to thousands of scientists and engineers. Large-scale production with 4,228 NVIDA GPUs began in March 2013—when most people still thought AI stood for “American Idol” and GPU for “Global Photographic Union.”

The Blue Waters supercomputer recently celebrated its eighth birthday! But what about the AI users that do not want or cannot invest in large scale clusters or supercomputers?

For you, we have the recently announced HPE Parallel File System Storage that delivers an IBM Spectrum Scale (FKA HPE-AI-storage-entry point-blog.pngGPFS)-based parallel file system starting already with 12 (!) storage drives (HDD or NVMe SSD) in four HPE ProLiant DL325 Gen10 Plus-based storage servers.

While that wins the “smallest parallel file system award,” this generally available HPE storage product scales beyond 20 petabytes usable capacity and terabyte per second speeds today. It delivers very efficient performance, especially when compared with NFS-based Scale-out NAS like Dell EMC Isilon.

HPE Parallel File System Storage in its entry configuration with just 12 NVMe SSD delivers about 35 gigabytes per second (GB/sec) throughput (read) while the high-end Dell EMC Isilon F800 model “just” delivers 15 GB/sec from 60 SSDs (see datasheet). That is 57% less data throughput from 150% more SSDs.

Why parallel storage now

When “kicking the tires” of AI, many organizations used enterprise scale-out NAS like Dell EMC Isilon or NetApp AFF to feed their GPU-accelerated compute nodes with data. Now as AI is scaling from POC to production for many organizations, NFS-based NAS storage is either breaking economically ($ per terabyte) or architecturally (performance/scalability).

This most likely is why Hyperion Research has found in its 2020 special study that the use of NFS-based storage is shrinking while more and more organizations are going parallel to cope with the data challenges of AI in production.

HPE-AI-storage-Hyperion Research-blog.png

Source: Hyperion Research, Special Study: Shifts Are Occurring in the File System Landscape, June 2020

If you want to understand why these shifts are happening, please read this business paper.

Ready to go parallel? HPE has the right AI storage 

 We are the right partner for you to go parallel for AI storage—whether you want to start with 12 drives with HPE Parallel File System Storage, or if you are looking for a 50,000+ drive parallel storage system like ORNL with Orion.

Scale your AI initiatives from POC to production with the right AI and HPC storage. Contact your HPE representative today.


Uli Plechschmidt
Hewlett Packard Enterprise

twitter.com/hpe_hpc
linkedin.com/showcase/hpe-ai/
hpe.com/us/en/solutions/hpc

About the Author

UliPlechschmidt

Uli leads the product marketing function for high performance computing (HPC) storage. He joined HPE in January 2020 as part of the Cray acquisition. Prior to Cray, Uli held leadership roles in marketing, sales enablement, and sales at Seagate, Brocade Communications, and IBM.

Events
Starting June 22
HPE Discover 2021
THE FUTURE IS EDGE TO CLOUD Prepare for the next wave of digital transformation. Join our global virtual event. June 22 – 24
Read more
HPE Webinars
Find out about the latest live broadcasts and on-demand webinars
Read more
View all