Around the Storage Block
1819866 Members
2533 Online
109607 Solutions
New Article
MichaelMattsson

HPE CSI Driver for Kubernetes 2.5.0: Improved stateful workload resilience and robustness

All the best lessons are learned from customers running workloads in production. With HPE CSI Driver for Kubernetes 2.5.0, HPE is rounding up some of the hardest lessons learned from supporting and running containerized applications at scale. We’ve also rolling up a few more features and enhancements to functionality and supportability.

Node monitoring and filesystem health checks

HPE CSI Driver for Kubernetes v2.5.0HPE CSI Driver for Kubernetes v2.5.0Inherent in the challenge of running networked storage is the fact that data paths may vanish then come back, and nodes can be abruptly restarted or become isolated from the Kubernetes cluster. “Split brain” is a term loosely used in the industry to illustrate how data corruption can occur when two hosts access the same device without knowing of the other. It’s important to highlight that host fencing, a method to prevent split brain, was implemented since day one, version 1.0.0 of the HPE CSI Driver to protect your data. However, in the real world, you’ll find more facets to how failures surface and how collateral damage may prevent workloads from running reliably and resilient.

A Node Monitor has been added to the CSI node driver to improve runtime device management. In the event of hard or intermittent host issues, workloads may be rescheduled by Kubernetes without the host knowing. A monitor thread wakes up on an interval to validate the devices against the Kubernetes control plane, if the control plane is reachable, or inspect devices to determine if the devices have been fenced. A cleanup procedure then safely removes devices and mount points to ensure the node is healthy when the node become available to the cluster again.

The CSI node driver is now also capable of issuing fsck (filesystem check) commands against dirty filesystems or attempt fsck in the event of a mount failure. System administrators may also opt-in if automatic repair should be attempted using the “fsRepair” StorageClass parameter. Learn more about the “fsRepair” parameter in the StorageClass documentation.

Compute and memory limits

In resource constrained environments such as the edge or when the CSI driver is running in a billed per unit Kubernetes cluster, it’s important to know exactly how much resources a given component is allowed to consume in terms of compute and memory. With HPE CSI Driver 2.5.0, all the containers deployed by the HPE CSI Driver for Kubernetes Helm Chart has been decorated with default resource requests and limits. These resource requests and limits are configurable at deploy time should the driver need additional or reduced resources. The default supplied parameters are well tested and recommended for general purpose workloads where no excessive churn is expected. See the Helm Chart for details.

The NFS Server Provisioner has been further refined with additional StorageClass parameters that control resource requests and limits to allow customers to better govern utilization for ReadWriteMany filesystems Persistent Volume Claims. These parameters have well-tested defaults but should be considered in large cache and more threading is needed for a particular workload. In addition to the NFS Server Provisioner, users may compartmentalize dedicated NFS nodes with per StorageClass node selector. For example, workload A may only provision NFS servers on nodes labelled A and workload B may only utilize nodes B. This allows for better segmentation in large clusters. Please see the documentation for more information about the nfsNodeSelector.

Expanded ecosystem support and CSI features

This HPE CSI Driver release introduce official support for upstream Kubernetes 1.30 and sunsetting support for 1.26. Partner solutions that embed 1.26 and older versions of Kubernetes are handled separately, such as Red Hat OpenShift. The solution will still be supported as per the compatibility and support table. OpenShift 4.16 certification is underway and should be available in the next couple of weeks. Full support for Ubuntu 24.04 has also been added to the list of supported host operating systems.


Compatibility and SupportCompatibility and Support

 This release introduces support for basic CSI topology. It will allow Kubernetes administrators to dictate which particular nodes have access to a particular storage backend through a StorageClass by using supplying topology keys on the aforementioned nodes and StorageClass. Workloads that consume a certain StorageClass does not need to be aware of which nodes have access to which backend as it has already been described. Prior to adding this feature, all the specifics had to be known by the user deploying the workload which in reality isn’t very practical.

CSI TopologyCSI Topology

Learn more about CSI topology with the HPE CSI Driver on SCOD.

Adjacent to the HPE CSI Driver for Kubernetes is the HPE Storage Array Exporter for Prometheus and the HPE CSI Info Metrics Provider for Prometheus. Recently released version 1.0.3 of those deliverables introduces support for HPE Alletra Storage MP Block which allows customers to build metric-rich dashboards and create insights around their infrastructure consumption and utilization.

Security and supportability

All container images have been updated with the latest upstream versions which contain several CVE (Common Vulnerabilities and Exposures) reductions. In addition, from a security perspective, management of iSCSI CHAP credentials have been improved and may not be supplied in plain text with the Helm Chart and instead managed by a Secret. This has implications for existing installations using iSCSI CHAP and there’s upgrade considerations to review before attempting an install of 2.5.0 on SCOD.

Container images referenced in the rendered Helm Chart has been improved to allow customers to selectively replace an individual image in the chart without rebuilding the entire chart. This is a quality-of-life improvement for customers running the HPE CSI Driver in air-gapped environments as each image can be replaced with the entire URL, not just the registry hostname which was the only supported way to use a different registry before 2.5.0. This also helps HPE technical support staff to quickly provide debugging images to troubleshoot customer’s environments. Learn about image names used in the “images” parameters can be found in the sample values.yaml file and the instructions for deploying to air-gapped environments have been updated.

OpenShift virtualization and KubeVirt enhancements

The uptick in virtualization on Kubernetes is not going unnoticed with the team as customers are clamoring to get projects started. The KubeVirt journey for the HPE CSI Driver has been incrementally improved for each recent release. It is now possible to clone ReadWriteOnce claims into ReadWriteMany where the access mode transformation now occurs as expected. There was also an issue with the Alletra Storage MP for Block family of products where the “MultiWriter” attribute didn’t persist as expected and that has been addressed as well. All KubeVirt users are encouraged to upgrade the HPE CSI Driver at the earliest.

OpenShift Virtualization users of 4.15.11 and later may now enjoy the correct StorageProfile applied for the csi.hpe.com provisioner prefix and customers may rejoice in a much better out of the box experience. Versions prior to 4.15.11 must update the StorageProfile for the default StorageClass as per HPE’s documentation on SCOD.

The Red Hat certified HPE CSI Operator for Kubernetes now relies on the OpenShift branded version of the Operator SDK which helps customers running OpenShift in air-gapped environments. The improved container image management mentioned in the previous section of this blog also makes this operationally feasible.

Deploy today

This is a major enhancement release with multiple contributors both internally and externally. Customers are encouraged to upgrade as soon as possible to take advantage of the enhanced resilience and robustness to ensure stateful workloads run as reliable as they possibly can.

Around The Storage Block is the primary source for news around HPE CSI Driver for Kubernetes. Stay tuned and don’t miss a beat! If you have questions, comments, or concerns about the HPE CSI Driver or just want to hang out with the folks at HPE close to this project, you can sign up for the HPE Developer Community Slack and join the #kubernetes and #alletra channels. See you there!


Storage Experts
Hewlett Packard Enterprise

twitter.com/HPE_Storage
linkedin.com/showcase/hpestorage/
hpe.com/storage

0 Kudos
About the Author

MichaelMattsson

Data & Storage Nerd, Containers, DevOps, IT Automation