- Integrated Systems
- About Us
- Integrated Systems
- About Us
The 5 new superlatives of AI storage
The rapid adoption of AI in organizations of all sizes breaks legacy storage architectures architecturally and/or economically. What worked for AI POCs no longer works in large-scale production. Learn about the new records HPE is setting when it comes to AI storage.
Data is the life blood of artificial intelligence (AI) and deep learning (DL). Vast quantities of unstructured training data enhance accuracy in the search for potentially predictive relationships running on GPU-accelerated compute infrastructure.
Here are five specific examples in five different categories where new high or low watermarks are being set when it comes to AI storage attached to GPU-accelerated compute.
- The largest hybrid parallel file system
- The largest all-flash parallel file system
- The fastest restore capability for parallel storage
- The longest-serving large-scale parallel file system
- The smallest parallel file system
The Oak Ridge Leadership Computing Facility (OLCF), a U.S. Department of Energy high-performance computing user facility, recently announced the specifications of its new Orion file system. Among other systems at OLCF, Orion will support the upcoming Frontier exascale supercomputer that will feature four AMD GPUs for each AMD CPU. Orion is based on Cray ClusterStor E1000 and as a hybrid file system features three storage tiers:
- Flash-based performance tier of 5,400 nonvolatile memory express (NVMe) drives providing 11.5 petabytes (PB) of capacity at peak read-write speeds of 10 TB/second
- Hard-disk-based capacity tier of 47,700 perpendicular magnetic recording drives providing 679 PB of capacity at peak read speeds of 5.5 TB/second and peak write speeds of 4.6 TB/second
- Flash-based metadata tier of 480 NVMe devices providing an additional capacity of 10 PB.
This represents the new high watermark for large high performance file systems.
When it comes to all-flash file systems the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory (Berkeley Lab) is setting the bar. Its next-generation supercomputer, Perlmutter, includes an all-flash file system with 35 petabytes (PB) usable capacity based on Cray ClusterStor E1000.
This all-flash file system will provide very high-bandwidth storage to the HPE Cray supercomputer that in phase one features compute nodes with four NVIDIA GPUs per AMD CPU. But new records are also set outside of the classic supercomputing leadership sites where the confluence of classic simulation with artificial intelligence (AI) is changing advanced computing as we know it.
The solution at zensact requires the ability protect and restore data at very high (record) speeds in order to hit the business-critical simulation window should a restore of the data be necessary. HPE Data Management Framework (DMF) running on HPE ProLiant DL rack servers could meet the requirement of restoring petabytes of data with about 200 gigabytes per second.
The “longest serving large-scale AI storage award” goes to the ClusterStor storage system of the Blue Waters supercomputer at the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign.
So far, it’s served data for more than 38 billion core-hours to thousands of scientists and engineers. Large-scale production with 4,228 NVIDA GPUs began in March 2013—when most people still thought AI stood for “American Idol” and GPU for “Global Photographic Union.”
The Blue Waters supercomputer recently celebrated its eighth birthday! But what about the AI users that do not want or cannot invest in large scale clusters or supercomputers?
For you, we have the recently announced HPE Parallel File System Storage that delivers an IBM Spectrum Scale (FKA GPFS)-based parallel file system starting already with 12 (!) storage drives (HDD or NVMe SSD) in four HPE ProLiant DL325 Gen10 Plus-based storage servers.
While that wins the “smallest parallel file system award,” this generally available HPE storage product scales beyond 20 petabytes usable capacity and terabyte per second speeds today. It delivers very efficient performance, especially when compared with NFS-based Scale-out NAS like Dell EMC Isilon.
HPE Parallel File System Storage in its entry configuration with just 12 NVMe SSD delivers about 35 gigabytes per second (GB/sec) throughput (read) while the high-end Dell EMC Isilon F800 model “just” delivers 15 GB/sec from 60 SSDs (see datasheet). That is 57% less data throughput from 150% more SSDs.
Why parallel storage now
When “kicking the tires” of AI, many organizations used enterprise scale-out NAS like Dell EMC Isilon or NetApp AFF to feed their GPU-accelerated compute nodes with data. Now as AI is scaling from POC to production for many organizations, NFS-based NAS storage is either breaking economically ($ per terabyte) or architecturally (performance/scalability).
This most likely is why Hyperion Research has found in its 2020 special study that the use of NFS-based storage is shrinking while more and more organizations are going parallel to cope with the data challenges of AI in production.
Source: Hyperion Research, Special Study: Shifts Are Occurring in the File System Landscape, June 2020
If you want to understand why these shifts are happening, please read this business paper.
Ready to go parallel? HPE has the right AI storage
We are the right partner for you to go parallel for AI storage—whether you want to start with 12 drives with HPE Parallel File System Storage, or if you are looking for a 50,000+ drive parallel storage system like ORNL with Orion.
- Accelerated compute options
Supercomputers like the HPE Cray EX, or ultra-dense GPU modes like the HPE Apollo 6500 Gen10 Plus, or standard density, accelerated rack servers like HPE ProLiant DL
- Interconnect options
High speed networks like HPE Slingshot, InfiniBand HDR, or 100/200 Gigabit Ethernet
- Parallel file system options
The leading parallel file system in research (Lustre) embedded in the Cray ClusterStor E1000 Storage System, or the leading parallel file system in the enterprise (IBM Spectrum Scale) embedded in HPE Parallel File System Storage
- HPE DMF data protection options
HDD- or Tape-based backup/archive/restore for parallel data—on-premise or off-premise (co-location or public cloud)
- Consumption options
Purchasing, financing with HPE Financial Services, or pay-as-you-go and as-a-service models with HPE GreenLake
Scale your AI initiatives from POC to production with the right AI and HPC storage. Contact your HPE representative today.
Hewlett Packard Enterprise
Uli leads the product marketing function for high performance computing (HPC) storage. He joined HPE in January 2020 as part of the Cray acquisition. Prior to Cray, Uli held leadership roles in marketing, sales enablement, and sales at Seagate, Brocade Communications, and IBM.