A file storage architecture to power AI with performance at scale

StorageExperts · ‎09-11-2023

Today’s AI and other data-intensive applications demand file storage that has fast, predictable, and sustainable performance for massive data volumes. Discover how HPE GreenLake for File Storage meets that demand.

–By David Yu, HPE Storage Product Marketing

Very soon, the current trajectory of technology will make artificial intelligence (AI) mainstream. Major businesses and organizations across industries will leverage AI and other data-intensive applications to secure faster innovation, time to market, and competitive advantage. On the consumer side, more generative AI apps and services similar to ChatGPTs will be widely available to consumers via the internet.

Enabling these trends are faster, more powerful computing and file storage resources that can meet the demands of AI. There are vast quantities of file data to be processed and plentiful insights to be gleaned from that data. But to capture those benefits, you need not only high-speed, GPU-based servers but also ultra-fast file storage to feed data to those servers and store their output.

In our current blog series, we’ve addressed the requirement for enterprise performance at scale from file storage systems and shown how HPE GreenLake for File Storage fully meets that requirement. In this blog, we’ll dive deeper into the technical details of just how this architecture enables performance at scale.

Fast storage media alone can’t ensure performance

Start with the basics. To get fast performance, you of course need fast storage media. But not only fast storage media. Any storage company can buy fast, off-the-shelf components and sell them to customers willing to pay. Much more important is how the media is used: an ineffective storage architecture and growing system overhead can easily throttle the performance of fast media.

Some storage vendors try to achieve affordable performance via tiering, with spinning storage media as a lower-cost tier. The trade-off is that your data accessible with fast performance is limited by the flash tier, and performance for data on the lower-cost tier is degraded because of the inherent rotational delay introduced by spinning disks. And tiers create still more problems, as data must be moved between faster and slower storage, which means constant tuning. As all that overhead accumulates, legacy file storage can’t guarantee that you will be able to access and store the right data in the right place, at the right time, and with the right performance.

By contrast, HPE GreenLake for File Storage delivers consistent, linear performance scaling while overhead remains flat. With all-NVMe storage media for blazing-fast speed – and with no storage tiers – you get the fastest performance all the time. Now, you may ask, how does it do that?

An architecture designed for fast performance at exabyte scale

HPE GreenLake for File Storage is designed from the ground up for enterprise performance at scale. As described in a previous blog, this kind of performance is more than simply flat-out speed that reaches an unprecedented peak, for an instant in time, across a small data set. Instead, it’s fast, sustained performance that spans the entire scale of your data and doesn’t drop off even when processing extremely high volumes of data.

Think of this kind of performance as a sprinter who can run a marathon at a world-class 100-meter pace. It’s sustained high speed over a long stretch that’s enabled by an architecture designed for exabyte scale. Where legacy NAS infrastructure hits its limits at a certain capacity and slows down, HPE GreenLake for File Storage continues to scale and sustain performance as data capacities grow by leaps and bounds.

Two critical elements make it possible

HPE GreenLake for File Storage delivers enterprise performance at scale by leveraging two key components: VAST Data software with a DASE^TM (Disaggregated Shared Everything) architecture and HPE Alletra Storage MP modular, resilient hardware, which provides for independent scaling of performance and capacity.

Let’s unpack the architecture a bit. HPE Alletra Storage MP compute nodes are connected to HPE Alletra Storage MP storage over an NVMe fabric. Every compute node can access all storage nodes. In this way, HPE GreenLake for File Storage separates state and logic to eliminate all the drawbacks of legacy, scale-out NAS, including crosstalk, rebuilds, and interdependencies that only increase geometrically with cluster size. An architecture designed to grow to exabyte capacities future-proofs your storage investment and provides more scaling of performance than most organizations can ever consume. Meanwhile, independent scaling of performance and capacity provides flexibility and efficiency in achieving exactly the performance and capacity you need.

Everything we’ve just described is impossible with shared-nothing systems. Have a look at the contrasting summaries of these two architectures.

In particular, HPE GreenLake for File Storage achieves enterprise performance at scale in two ways:

Efficient I/O operations
Enhanced NFS performance

Efficient I/O operations

With HPE GreenLake for File Storage, controller nodes are stateless, because the DASE architecture enables them to be fully independent of metadata, which is always stored in a persistent distributed SCM (Storage Class Memory) layer in the storage nodes. All controller nodes in the file cluster can see every drive in the cluster, as if they were directly attached.

All the controller nodes also have access to the distributed SCM layer where the metadata resides. There is no crosstalk between the controller nodes, no server rebuilds upon failure, and so on. The completion of an I/O guarantees that data has been committed to persistent storage in SCM. This design alleviates the pressure of the cache and memory flush during a component failure to prevent any data loss or inconsistency in the data structure. Various mechanisms, such as atomic updates, modifying tree structures from the bottom up, locking, and timeouts are in place to ensure the consistency of the data structure even as access to data is shared among controller nodes.

This design helps HPE GreenLake for File Storage deliver unprecedented high performance that is predictable, reliable, and scalable. The solution achieves linear performance as you scale while system overhead to store, manage, and process increasing data remains flat. In short, it breaks through the performance limits of traditional systems built on shared-nothing architectures because it:

Does not have the management complexity that traditional systems incur, making those systems not only slow but unstable because overhead grows overwhelming
Has no architectural limitations that force scaling to be done with compute and storage in tandem

Furthermore, the data path for a write I/O is unique compared to other network-attached storage (NAS) solutions because it’s designed to overcome the common NAS challenges of achieving data consistency with volatile memory. This figure shows the write data path.

A write I/O originates from a client through any standard NAS protocol and reaches any of the controller nodes through a mount point IP in one of the virtual IP (VIP) pools. The controller node acknowledges the write request as successfully completed only when the following three tasks are performed:

Data is written to two different SCM drives (SCM mirroring) so that no data is lost in the event one of the SCM drives fails. To enhance resiliency, mirroring occurs across separate storage nodes if the system has more than one storage node.
Data is sharded randomly across multiple SCM drives to increase throughput and decrease contention.
Metadata is updated in the internal data structure.

This process ensures consistency in the data. After the write I/O is complete, the written data is located on persistent SCM storage with redundancy. Performance is enhanced because migration from SCM to SSD is an out-of-band process, performed by all controller nodes in the cluster, and running at its own pace. The migration occurs primarily at idle or slower times on the system.

The read data path is more straightforward than the write data path. After a controller node receives a read request from a client through any standard NAS protocol, it traverses the SCM metadata to fetch the data location. The controller node then retrieves the data from SCM or SSD and returns it to the client. Because controller nodes have the same access to SCM and SSD, the performance of the read I/O is not affected, whether the requested data is on SCM or on SSD.

The efficient I/O operation processes enabled by HPE GreenLake for File Storage show one way that its performance is boosted at scale. Now let’s look at its NFS performance.

Enhanced NFS performance

HPE GreenLake for File Storage offers enhanced NFS performance. In addition to the default NFS mount with single-socket connection between the client and the storage port, a multipathing feature is available for multi-socket access through the Linux nConnect feature. With nConnect, on most newer Linux distributions, the nConnect mount option can be used to configure up to 16 TCP connections between the client and the single storage port (VIP pools on the file cluster). The solution supports multipathing for NFS clients. This feature enables a client to open multiple connections from multiple ports to multiple addresses. It is supported for both version NFSv3 and version NFSv4.1, and it requires a package to be installed on the NFS client.

In addition, the solution will support NFS over RDMA, bypassing CPU resources to deliver enhanced performance. And to take it even further, GPUDirect Storage will take the performance to a whole new level as it will even bypass client-side memory to achieve up to a remarkable 170GB/sec performance per host. These multiple connections and paths enable the solution to achieve enhanced NFS performance scaling.

Power your AI workloads with consistent performance at scale

In our previous blog on performance, we detailed the ability of HPE GreenLake for File Storage to deliver enterprise performance at scale. In this blog, we’ve peeked under the hood to understand what makes that kind of scalable performance possible. From high-level architecture to technical details, HPE GreenLake for File Storage was designed for performance at scale to power AI and other data-intensive workloads.

Moreover, we’ve shown two reasons why HPE GreenLake for File Storage is not only fast, but seamlessly scales performance when processing high volumes of data. These two attributes – efficient I/O operations and enhanced NFS performance – show how the solution’s underlying architecture ensures performance at scale while providing compelling evidence that HPE GreenLake for File Storage is the right choice to power modern applications in today’s AI-driven world.

Want to learn more?

Check out our ongoing blog series:

A cloud management experience for file storage minus security risks or trade-offs

A file storage architecture for enterprise performance at scale

Enhance productivity and efficiency with modern file storage

Powering the promise of AI with HPE GreenLake for File Storage

Modern file storage accelerates the AI-driven search for cures

And here's more deeper-dive information on HPE GreenLake for File Storage:

Read: HPE GreenLake for File Storage architecture

Read: Technical overview: Inside HPE GreenLake for File Storage

Watch: HPE GreenLake for File Storage technical demo

Meet Storage Experts blogger David Yu, HPE Storage Product Marketing

David has a key product marketing role in HPE’s storage business, covering such areas as file-and-object storage, scale-out storage, cloud-native data infrastructure, and associated cloud data services.

Storage Experts
Hewlett Packard Enterprise

twitter.com/HPE_Storage
linkedin.com/showcase/hpestorage/
hpe.com/storage

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

A file storage architecture to power AI with performance at scale

Today’s AI and other data-intensive applications demand file storage that has fast, predictable, and sustainable performance for massive data volumes. Discover how HPE GreenLake for File Storage meets that demand.

–By David Yu, HPE Storage Product Marketing

Fast storage media alone can’t ensure performance

An architecture designed for fast performance at exabyte scale

Two critical elements make it possible

Efficient I/O operations

Enhanced NFS performance

Power your AI workloads with consistent performance at scale

Want to learn more?

Meet Storage Experts blogger David Yu, HPE Storage Product Marketing

StorageExperts

Author

Kudos