Around the Storage Block
1821057 Members
2410 Online
109631 Solutions
New Article ๎ฅ‚
StorageExperts

Under the hood: HPE Alletra Storage MP X10000 architectural differentiation

Explore the technical differentiation of the HPE Alletra Storage MP X10000 platform. Learn all about high performance in all dimensions plus how you can start smaller without compromises, and easily extend the platform thanks to the fully containerized architecture. โ€“ Dimitris Krekoukias, Senior Distinguished Technologist, HPE

Have you heard the news about the newly released HPE Alletra Storage MP X10000? Itโ€™s the latest in our new line of shared hardware platform storage offerings.

This innovative new platform is specially made for unstructured data. Initially, the solution was aimed at workloads requiring fast S3 performance, including AI workloads, data lakes, cloud native app development and high speed restore and backup. Innovations such as RDMA for object explain how this highlyHPE Storage-X10000-BLOG.png differentiated platform now supports a smaller starting capacity โ€“ instead of only focusing on the bigger scale.  Most importantly, this is HPE technology โ€“ not a partnership with any other company.

Now letโ€™s dig in โ€“ with a first quick look at the benefits 

My goal here is not regurgitate basic information about the HPE Alletra Storage MP X10000 platform, but rather to explain the true technical differentiation to get you truly excited about the possibilities. Letโ€™s start with the benefits โ€“ and the primary ones are:

  • Disaggregation flexibility for separately expanding compute and/or capacity
  • Ability to scale down and not need huge capacities to get good performance
  • Balanced read/write performance and low latency for all workloads
  • Flexible, fully container-based architecture that opens up tons of possibilities for running customer code inside the storage solution

Why did HPE make this?

Other products in this space have one or more of these deficiencies:

  • Being built with a specific protocol as the foundation. I mean, for example, โ€œunder the hoodโ€ being truly file, object, or block storage, running other protocols as emulations, such as object on top of file (very common), or block on top of object, etc. This inevitably results in some sort of inefficiency or inflexibility.
  • Scaling performance and capacity canโ€™t be done independently.
  • Needing multiple โ€œbucketsโ€ to get maximum performance. This leads to management complexity.
  • Needing special media or special memory to get good performance.
  • Being severely compromised in one or more performance dimensions. For example, good for large object throughput but high latency for small objects, or poor for writes but good for reads.
  • Needing extremely large capacities and controller counts to achieve good performance.
  • Offering very high minimum starting points, which puts even the smallest solutions out of reach for most customers.
  • Lacking the ability to provide computational storage.

Now letโ€™s tackle these issues, as I explain how X10000 innovations avoid these issues, resulting in better business outcomes (and fewer headaches).

RDMA and GPUDirect for S3

Exciting things first: Weโ€™re working with NVIDIA to add support to X10000 for RDMA so you can enjoy GPUDirect for object workloads. Once that feature is finalized, it will be one of the very few object solutions with that ability (and full certification). This technology will both greatly improve performance versus TCP and significantly reduce CPU demands, allowing you to get a lot more out of your infrastructure โ€“ and eliminate NAS bottlenecks from GPUDirect pipelines.

X10000 architecture fundamentals

Under the covers, the entire system is built using containers. This is useful from a flexibility and scalability standpoint. For instance, what if we let users also run their own containers? Or offer extra HPE services as containers? Lots of possibilities open up for computational storage (and very fast nodes with ample performance are provided to explore those possibilities).

The name is a bit of a clue: B10000 is the block system. X can be. . . many things.

Resiliency and data integrity

Extreme resiliency and data integrity are ensured with Triple+ Erasure Coding and Cascade Multistage Checksums. This marks the evolution of the protection first seen in HPE Nimble Storage and then HPE Alletra 5000 and 6000 storage systems. If you want to learn more about those fundamentals and why they matter, check out this document: HPE Nimble Storage, HPE Alletra 5000, and HPE Alletra 6000 storage architecture โ€“ The history and implementation of the Cache Accelerated Sequential Layout

When compared to Nimble, the major difference here is that the write buffer now goes directly to the disks instead of mirrored NVDIMM. This eliminates the HA pair restriction and allowing similar concepts of cluster resiliency as HPE Alletra Storage MP B10000 but at even larger scale. To learn more about the benefits of eliminating HA pairs, check out this blog: Building the case for HPE Alletra Storage MP architecture

The other big difference is that Triple+ RAID isnโ€™t done on whole disks but rather each disk is carved into โ€œdiskletsโ€ (small logical disks โ€“ the smallest size is 1GB) for granularity and flexibility. These disklet RAID groups can be confined to a single JBOF or span JBOFs. RAID groups within a JBOF allow x10000 to scale down efficiently. RAID groups across JBOFs can protect against entire JBOF failure for larger clusters.

A different kind of data service partitions

HPE Alletra Storage MP X10000 uses DSPs (sharded Data Service Partitions) to vertically slice up the workload between controller nodes. DSPs are automatically generated, portable, and allow capabilities like resiliency in case of extreme loss. For example, if you lose more than one node, the DSPs get nicely and evenly redistributed among the remaining controllers.

Disklet RAID slices are dynamically allocated to DSPs as needed. Notice the color coding in the image denoting an example of slice ownership by DSP in Figure 1.

On addition of one or more nodes, DSPs are rebalanced. Because all state is only persisted within JBOFs and nodes are completely stateless, this movement of DSPs takes a few seconds. Thereโ€™s no data movement involved. To expand the performance capability of a cluster, all it requires is addition of controllers and redistribution of DSPs. Since objects are distributed across DSPs based on a hash, performance is always load balanced across the nodes of a cluster.

X10000 Figure 1.png

 Figure 1. HPE Alletra Storage MP X10000 architecture

Every protocol is a first-class citizen

With the X10000, a log-structured Key-Value store implements protocol-agnostic storage of data and metadata chunks and is the foundational data layer. It is optimized for flash access, reducing write amplification with a log-structured and extent-based approach.

On top of the KV store are native protocol-specific namespace layers, such as object. These protocol layers are optimized for the semantics of a specific protocol, treating each as a first-class citizen. This allows X10000 to take advantage of the strengths of each protocol, without inheriting the downsides of a second protocol or running protocols on top of each other (like object on top of file or vice versa). 

Independently scale performance and capacity

Hereโ€™s a very common problem: Systems can end up having too much compute and not enough capacity, and vice versa. The X10000 (just like its block storage cousin, the HPE Alletra Storage MP B10000) allows you to separately increase compute versus space, so that the optimal blend of performance versus capacity is achieved not just initially but long-term. This reduces TCO and eliminates waste.

A single bucket can get all the speed

S3 buckets are a useful structure. However, with certain object solutions, it's common to need multiple S3 buckets to get the most performance out of a system. However, typical unstructured workloads such as analytics and data protection, assume a single bucket or a small number of buckets per application unit such as a single warehouse or backup chain.

The X10000 doesn't have this problem. Even a single bucket is enough to get the maximum performance out of the hardware. This frees the administrators from unnecessary complexity to gain speed. Instead, buckets can be used for the true utility purpose they serve instead of an inelegant performance hack.

This is especially felt on writes. Versus some competitors, we may be 60x faster for small object PUT operations when using a single bucket. With its ability to scale a single bucket linearly, individual applications benefit from the X10000 platformโ€™s scale out ability โ€“ just the same as a large number of applications or tenants.

No need for special media to get good performance

Continuing the paradigm of the B10000 system, no special drives or exotic memory are needed to get good performance out of the system. Standard enterprise SSDs are used. (Initially, these are TLC, but weโ€™ll add QLC). This helps reduce TCO and removes the reliance on uncommon components that may be affected by a shift in supplier strategy.  It also helps performance scale better when more SSDs are added since all SSDs are used for all aspects of performance in parallel, eliminating a specific component being a bottleneck.

Unstructured workload performance needs vary greatly

Unstructured data workloads are extremely varied, and even within a certain use case category like artificial intelligence, workload characteristics can vary widely. Figure 2 characterizes typical machine and deep workloads. While many object architectures prioritize bandwidth-oriented performance, X10000 object namespace and the rest of the data path stack deliver high IOPS-oriented performance for small objects along with high bandwidth for larger objects, and low latency for GETs and PUTs (< 2ms).

X10000 Figure 2.png

Figure 2. The variability of AI workloads

Varied needs require performance in every dimension

The X10000 is designed to provide balanced read-vs-write performance, both for high throughput and small transactional operations. This means that for heavy write workloads, you don't need a massive cluster. The result is an optimized performance experience, regardless of workload. This gives you the ability to reach performance targets without waste.

Performance scales linearly as the cluster is expanded

Two key architectural decisions enable the X10000 to deliver high IOPS performance. First, the platformโ€™s log-structured key-value store is extent-based. Extents are variable sized. Extent-based metadata and layout allow X10000 to adapt metadata and data accesses to application boundaries. Second, the platformโ€™s write buffer and indexes are optimized for small objects. X10000 implements a write buffer to which a small PUT is first committed, before it is destaged to a log-structured store and metadata updates are merged into to a Fractal Index Tree for efficient updates.

Write path

The write path is interesting: Small object PUTs are committed to the X10000  write buffer prior to destaging them to the log-structured, erasure-code-protected store. The commit to a write buffer reduces the latency of small PUTs and reduces write amplification. The write buffer is stored on the same SSDs as the log-structured store and formed out of a collection of disklets.

Using SSDs for the write buffer was first implemented on the B10000, which showed that an SSD-based write buffer can deliver the same high reliability and low latency as prior approaches such as NVDIMM, even for latency-sensitive structured data workloads.

Additionally, X10000 takes advantage of object semantics to completely bypass the write buffer beyond a certain object size threshold, and instead directly writes the large object as part of a RAID stripe. This reduces write amplification, improves write performance and is part of the collection of techniques used to deliver X10000โ€™s high write performance.

Great performance efficiency even with a small deployment

A design goal of the X10000 was to provide high initial performance โ€“ even with a relatively small deployment. The minimum starting point is three nodes and one JBOF. This also reflects the Node:JBOF performance ratio potential. With the drives used today, a single JBOF is enough to get high performance across all metrics with three nodes. Adding a fourth node wonโ€™t add much more performance until a second JBOF is added, then performance will linearly scale until you hit six nodes, etc.

Ability to start small and scale later (including scaling down)

Itโ€™s always cool to talk about exascale systems and scaling up. HPE makes the largest by far and the second largest has been measured to a cool 11TB/s storage throughput (yes, 11 Terabytes per second!).

But what about scaling down? The X10000 doesn't require lots of large capacity drives to get good performance. The smallest recommended config is with 3.84TB SSDs, about 92TB raw. That's a small enough amount of capacity for most customers. In that amount of capacity, competing solutions would either not exist, or typically be incredibly constrained in at least one performance and/or TCO dimension.

Space efficiency and data reduction

Incoming data is compressed, and good space efficiency is ensured by using 24-disklet RAID groups. For backup applications, HPE's Rapid Restore solution for the X10000 combines exabyte-scale and performance-designed for faster object data write and retrieval, with HPE StoreOnce Catalyst technology maximizing storage efficiency and data security. Benefits include fully encrypted backups, improved storage efficiency up to three times over competitors, and rapid data recovery. The X10000 is also partner-certified as a backup target when backing up directly from CommVault and Veeam. Read the blog to learn more.

Last but not least: manageability

Deploying and using advanced storage shouldnโ€™t be a science project. The X10000 is managed using the same common framework as all of HPEโ€™s modern storage, server and network solutions. No need to learn a new interface or go to a different management portal. Watch and see how easy the management is.

Now you know the complete story on enterprise-grade, scale-out object storage designed to accelerate data-intensive applications

The X10000 offers a new and innovative approach in the space of unstructured data storage solutions. From the containerized architecture that provides the possibility of computational storage, to the efficient design that allows you to have balanced performance even at a smaller starting point, the X10000 is an exciting solution thatโ€™s aimed at solving practical problems in a practical way.

Read the blog: HPE Alletra Storage MP X10000 โ€“ Unstructured data storage from repository to value creator


Meet Storage Experts blogger Dimitris Krekoukias, Senior Distinguished Technologist, HPE

DK-HPE Storage.pngDimitris contributes to HPEโ€™s strategy, product and process enhancements, and product launches. Focused on bringing value to HPEโ€™s largest customers, he engages with senior decision makers. He also speaks at industry, competitive, and marketing events.

 


Storage Experts
Hewlett Packard Enterprise

twitter.com/HPE_Storage
linkedin.com/showcase/hpestorage/
hpe.com/storage

 

 

 

 

About the Author

StorageExperts

Our team of Hewlett Packard Enterprise storage experts helps you dive deep into relevant data storage and data protection topics.