Defining hyperconverged infrastructure Part 4: HPE SimpliVity data storage built for resiliency

brianknudtson · ‎01-09-2018

Manned missions to space are possible because the equipment that takes them there is specifically designed to protect the astronauts. This is because the astronauts are the most important asset in the program. Rocket scientists build redundancies directly into the system to help ensure that failures do not place the astronauts at risk. This approach has led to an amazing success rate, despite a few tragic episodes. In business, data is the company’s most important asset and maintaining the availability of that data should be the first requirement of every storage product.

When an IT department considers a new storage product for its environment, the team should first and foremost be concerned with availability. All too often, I’ve heard stories about businesses who lost millions of dollars because of systems being down for hours or days. There should be two primary goals of all data storage technologies: keep the data available and ensure the integrity of the data. This is why IT administrators fear single points of failure and lack of resiliency in business systems: weak points like these put your data at serious risk. .

Resiliency - The ability of a storage element to preserve data integrity and availability of access despite the unavailability of one or more of its storage devices.

Building a resilient data storage architecture is all about reducing single points of failure to eliminate the impact lost devices have on production workloads, and reliably recover data services from the loss of non-redundant components. While there are many challenges to designing and building a scale-out storage architecture, one advantage is the ability to create different layers of failure domains. Nodes and clusters are the two main failure domains within a hyperconverged infrastructure.

Wherever possible, IT teams want to eliminate single points of failure within a node in order to avoid any downtime of the node itself. Because the HPE ProLiant DL380 server is the basic building block for the HPE SimpliVity 380 powered by Intel® , it means the platform was already architected with highly redundant components, including power supplies, ECC memory, and multiple NIC ports. Disks are statistically the most likely component to fail in a server, which is why HPE utilizes RAID controllers to minimize the impact on performance after the loss of a disk and improve the number of disks that can be lost before availability is impacted.

Of course, even with all these redundancies, nodes do fail, so making sure the node itself isn’t a single point of failure is equally important. This is why the HPE SimpliVity hyperconverged infrastructure does not commit a write back to the VM until it has been committed to two different HPE SimpliVity 380 nodes. At this point, the block of data is stored in the RAM on the HPE OmniStack Accelerator Card in each node. This RAM is backed by super capacitors that can be used to flush the data to flash storage on the card should there be a power loss on the node. Once fully processed, the block is stored down to the RAID-protected disks for permanent storage. In this way, every single block of data is protected from the loss of an entire node, and within the node from both disk and power failures from the moment the block is committed.

There are some components where it is impractical to make redundant within a single node. The most obvious example here is the motherboard itself. Instead of placing two motherboards in a single server, IT systems use multiple servers with automated workload failover (e.g. VMware vSphere HA, Microsoft Server Cluster Services) to handle the loss of a single motherboard. This is the approach HPE SimpliVity took to provide resiliency for the Accelerator Card. Since multiple nodes are necessary to protect the failure domain of a single node/motherboard, multiple nodes are used to protect the failure domain of a single HPE OmniStack Accelerator Card.

If an Accelerator Card fails, the associated HPE OmniStack Virtual Controller will also shut down and the IP address of the Virtual Controller will be failed over to another HPE OmniStack Virtual Controller. This allows the VMs on the node with the failed Accelerator Card to continue to run and still have uninterrupted access to the data. All the data is available on one of the other nodes, so no loss of storage or application availability occurs. On the other hand, technologies like vSphere HA and Microsoft Server Cluster Services usually require a restart of the VM or application services.

If the Accelerator Card failure is a permanent loss and needs to be replaced, there is no dependency on the original card to retrieve and understand data off the disks. All data, metadata, and index tables are persisted to disk, so a new Accelerator Card will be able to read the existing data. On the rare chance that the data was corrupted or could’ve been corrupted, all data can be rebuilt from the existing data on the remaining nodes.

HPE SimpliVity utilizes the Accelerator Card to provide the predictable and peak performance that enterprise customers require, making it a critical component of the platform. Utilizing this approach, along with native data protection, HPE SimpliVity can ensure a resilient data storage architecture that customers can rely on to protect their data, even during failures. Check out the HPE SimpliVity 380 technology deep dive white paper for more.

Make sure to read the other articles in this series:

Brian Knudtson
Sr. Technical Marketing Manager
Hewlett Packard Enterprise

@bknudtson
bknudtson

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Defining hyperconverged infrastructure Part 4: HPE SimpliVity data storage built for resiliency

brianknudtson

Author

Kudos