Around the Storage Block
1753730 Members
4667 Online
108799 Solutions
New Article
dikrek

Uncompromising Resiliency with Nimble Storage

(This was originally posted on my recoverymonkey blog)

The cardinal rule for enterprise storage systems is to never compromise when it comes to data integrity and resiliency.  Everything else, while important, is secondary.

Many storage consumers are not aware of what data integrity mechanisms are available or which ones are necessary to meet their protection expectations and requirements. It doesn’t help that a lot of the technologies and the errors they prevent are rather esoteric. However, if you want a storage system that safely stores your data and always returns it correctly, no measure is too extreme.

The Golden Rules of Storage Engineering

When architecting enterprise storage systems, there are three Golden Rules to follow.

In order of criticality:

  1. Integrity: Don’t ever return incorrect data
  2. Durability: Don’t ever lose any data
  3. Availability: Don’t ever lose access to data

To better understand the order, ask yourself, “what is preferred, temporary loss of access to data or the storage system returning the wrong data without anyone even knowing it’s wrong?”

Imagine life or death situations, where the wrong piece of information could have catastrophic consequences. Interestingly, vendors exist that focus a lot on Availability (even offering uptime “guarantees”) but are lacking in Integrity and Durability. Being able to access the array but have data corruption is almost entirely useless. Consider modern storage arrays with data deduplication and/or multi-petabyte storage pools. The effects are far more severe now that a single block represents the data for 1-100+ blocks and data is spread across 10’s – 100’s of drives instead of a few drives.

The Nimble Storage Approach

Nimble Storage has taken a multi-stage approach to satisfy the Golden Rules, and in some cases, the amount of protection offered verges on being paranoid (but the good kind of paranoid).

Simply, Nimble employs these mechanisms:

  1. Integrity: Comprehensive multi-level checksums
  2. Durability: Hardened RAID protection and resilience upon power loss
  3. Availability: Redundant hardware coupled with predictive analytics

We will primarily focus on the first two as they are often glossed over, assumed, or not well understood. Availability will be discussed in a separate blog, however it is good to mention a few details here.

To start, Nimble has greater than six nines measured and guaranteed uptime (more info here). This is measured across more than 9,000 customers using multiple generations of hardware and software. A key aspect of Nimble’s availability comes from InfoSight which continually improves and learns as more systems are used. Each week, trillions of data points are analyzed and processed with the goal of predicting and preventing issues, not just in the array, but across the entire infrastructure. 86% of issues are detected and automatically resolved before the customer is even aware of the problem. To further enhance this capability, Nimble’s Technical Support Engineers can resolve issues faster as they have all the data available when an issue arises. This bypasses the hours-days-weeks often required to collect data, send to support, analyze, repeat – until a solution can be found.

Data Integrity Mechanisms in Detail

The goal is simple: What is read must always match what was written. And, if it doesn’t, we fix it on the fly.

What many people don’t realize is there are occasions where storage media will lose a write, corrupt it or place it at the wrong location on the media. RAID (including 3-way mirroring) or Erasure Coding are not enough to protect against such issues. The T10 PI standard employed by some systems is also not enough to protect against all eventualities.

The solution involves using checksums which get more computationally intensive the more paranoid one is. As checksums are computationally intensive, certain vendors don’t employ or minimally employ them to gain more performance or faster time to market. Unfortunately, the trade-off can lead to data corruption.

Broadly, Nimble creates a checksum and a “self-ID” for each piece of data. The checksum protects against data corruption. The self-ID protects against lost/misplaced writes and misdirected reads (incredible as it may seem, these things happen enough to warrant this level of protection).

For instance, if the written data has a checksum, and corruption occurs, when the data is read and checksummed again, the checksums will not match. However, if instead the data was placed at an incorrect location on the media, the checksums will match, but the self-IDs will not match.

checksumsWhere it gets interesting:

Nimble doesn’t just do block-level checksums/IDs. Nimble does Cascade Multistage Checksums. These multi-level checksums are also performed:

  1. Per segment in each write stripe
  2. Per block, before and after compression
  3. Per snapshot (including all internal housekeeping snaps)
  4. For replication
  5. For all data movement within a cluster
  6. All data and metadata in NVRAM

This way, every likely data corruption event is covered, including metadata consistency and replication issues, which are often overlooked.

Durability Mechanisms in Detail

There are two kinds of data on a storage system and both need to be protected:

  1. Data in flight
  2. Data on persistent storage

One may differentiate between user data and metadata but we protect both with equal paranoid fervor. Some systems try to accelerate operations by not protecting metadata sufficiently, which greatly increases risk. This is especially true with deduplicating systems, where metadata corruption can mean losing everything!

Data in flight is data that is not yet committed to persistent storage. Nimble ensures all critical data in flight is checksummed and committed to both RAM and an ultra-fast byte-addressable NVDIMM-N memory module sitting right on the motherboard. The NVDIMM-N is mirrored to the partner controller and both controller NVDIMMs are protected against power loss via a supercapacitor. In the event of a power loss, the NVDIMMs simply flush their contents to flash storage. This approach is extremely reliable and doesn’t need inelegant solutions like a built-in UPS.

Data on persistent storage is protected by what we call Triple+ Parity RAID. Many orders of magnitude more resilient than RAID6. For comparison, RAID6 is several orders of magnitude more resilient than RAID5 (making Triple+ hundreds of thousands of times more resilient than RAID5).

The “+” sign means that there is extra intra-drive parity that can safeguard against entire sectors being lost even if three whole drives simultaneously fail in a single RAID group. The extra parity is at the chunk level (unit written per drive).

Think about it: A Nimble system can lose any 3 drives simultaneously, and, while parity is being rebuilt, every other remaining drive could be suffering simultaneously from sector read errors and still no data would be lost and rebuild would complete.

This is an unprecedented level of protection.

Some might say this is a bit over-engineered, however with drive sizes increasing rapidly (especially SSDs) and drive read error rates increasing as drives age, plus the sheer amount of metadata in deduplicating storage solutions, it was the architecturally correct choice to make for a future-proofed solution.

It also really helps in the case of a bad batch of drives (I've seen it happen to various vendors - delayed-onset errors).

If you want to test the reliability numbers on your own, find a Mean Time to Data Loss calculator (there are some online). Compare triple parity to other schemes (the calculator might have triple parity listed as z3). Then multiply the number for triple parity by 5 (the extra reliability the additional chunk parity affords) and you get the Nimble Triple+ number

A cool experiment is to try really large drive sizes in the MTTDL calculator, like 100TB, and assume poor MTBF and read error rates (simulating maybe a bad batch of drives, overheating, solar flare activity, whatever bad circumstances you can think of).

Look at the numbers.

In Summary

Users frequently assume that all storage systems will safely store their data. And they will, most of the time. But when it comes to your data, “most of the time” isn’t good enough. No measure should be considered too extreme. When looking for a storage system, it’s worth taking the time to understand all situations where your data could be compromised. And, if nothing else, it’s worth choosing a vendor who is paranoid and goes to extremes to keep your data safe.

D

0 Kudos
About the Author

dikrek

Dimitris Does What Needs to Be Done at HPE. He contributes to HPE’s strategy, product and process enhancements and product launches, as well as speaking at industry events and helping customers. Sometimes janitorial duties.