Workload modernization with KubeVirt: The next-generation virtualization solution

HPE_Experts

Introducing an alternative for virtualization modernization: KubeVirt, implemented with HPE Services expert guidance for mission-critical applications.

In today's rapidly evolving IT landscape, enterprises often struggle with the complexities of managing and modernizing their virtualization platforms while ensuring stability, performance, and scalability of their mission-critical applications.

In a previous article, we explained how Migration Toolkit for Virtualization (MTV) can streamline the migration of workloads to Red Hat OpenShift Virtualization, enabling customers to efficiently transition their existing virtual machines (VMs) from a legacy platform such as VMware vSphere into a cloud native environment like OpenShift Virtualization.

In this blog, we will explore live migration, the process of transferring a running VM from one hypervisor to another within the same OpenShift cluster without requiring a shutdown or downtime window.

Although both MTV and live migration cover a similar topic, they tackle different problems. Red Hat Migration Toolkit for virtualization will assist in the migration of VMs from a third-party virtualization cluster into OpenShift virtualization. Meanwhile, live migration will handle the seamless move of VMs already running in the same OpenShift Virtualization cluster with zero downtime, which is a must-have feature for mission-critical workloads.

To understand how live migration works in OpenShift, it is imperative to get familiar with some basic terminology, starting with reconciliation loops.

Reconciliation loop and live migration in KubeVirt

Reconciliation loop refers to the process by which KubeVirt ensures that the actual state of the infrastructure matches the desired state. KubeVirt controllers continuously check for discrepancies, and if there are any, the reconciliation loop triggers the actions needed to reach the desired state.

During the live migration process, the reconciliation loop helps to maintain consistency and ensures that the VM state is correctly transferred and synchronized in the new node. The migration would not affect the running status or configuration of the VM. Any concurrent sessions that are consuming services from that VM should continue to operate normally, and the end users shouldn’t experience any anomaly.

While live migration is designed to be seamless, failures can still take place due to network interruptions or resource constraints on the target node. The reconciliation loop can detect these failures and take corrective actions like rolling back the VM to its original node and continuing to provide the service. It will also try to reinitiate the live migration process if it deems that it is now safe to do so.

In a large Kubernetes cluster with multiple worker nodes, the VM and the underlying persistence volume need to be accessible in a shared volume. Live migration is a complex process involving multiple dependencies. It is important to factor in the underlying constraints on the compute, storage, and network resources in the Kubernetes cluster while performing high-volume live migrations.

Some of the limitations and requirements that we have in a live migration exercise are as follows:

VMs using a PersistentVolumeClaim (PVC) must have a shared ReadWriteMany (RWX) access mode to be migrated live.
Live migration is not allowed with a pod network binding of bridge interface type.
Live migration requires ports 49152—49153 to be available in the virt-launcher pod. If these ports are explicitly specified in the masquerade interface, live migration will not function.
Live migration requires the virt-launcher pod's primary network interface to have the same name on both source and target pods.

Keeping in mind the mentioned requirements and the limitations of live migration helps us to have smooth migration with no complications.

As part of the value add that Hewlett Packard Enterprise provides to OpenShift Virtualization customers, we take very seriously the proactive monitoring of the hypervisors and controllers that form part of the cluster and how they interact with KubeVirt in real time. The next section articulates how we achieve that.

Hardware monitoring for OpenShift Virtualization—the HPE way

As we are aware, HPE servers have HPE iLO capabilities available that can be leveraged on each worker node that is part of the OpenShift Virtualization cluster. HPE iLO provides a REST API for integration, which can be used to fetch health metrics and display it on an OpenShift dashboard when configured properly.

In the technical validation for live migration that was done in the Hewlett Packard Labs, the primary focus was to ensure the availability of the running VMs and understanding of the KubeVirt behavior during the migration. Additionally, we are looking to leverage a combination of HPE infrastructure management solutions, combined with the OpenShift native tools. That will give us a holistic view of both the hardware and application workloads running in the infrastructure.

We used Prometheus and Grafana, along with the HPE storage array exporter for Prometheus and HPE OneView integration with Prometheus, to enhance live migration. This setup helps assess resource usage during migration and identify potential bottlenecks before proceeding. Figure 1 is a screenshot of a storage array dashboard displayed in an OpenShift Virtualization dashboard (powered by Grafana).

Figure 1. Custom HPE Storage dashboard integrated with OpenShift monitoring

HPE OneView is our single pane of glass infrastructure management designed to monitor and administer servers. HPE OneView can be integrated with OpenShift by leveraging the HPE OneView exporter for Prometheus, which is at the heart of OpenShift monitoring. That way, it is possible to create an OpenShift monitoring dashboard that displays server metrics collected by the HPE OneView exporter.

Figure 2 is a screenshot of Grafana (at the heart of OpenShift monitoring) with an HPE OneView dashboard showing a bird’s eye view of the health of the cluster nodes.

Figure 2. Custom HPE OneView dashboard integrated with OpenShift monitoring

HPE InfoSight is another example of an AI-driven, cloud-based infrastructure management and monitoring platform that provides deep insights into hardware performance and overall health. While HPE InfoSight primarily focuses on monitoring HPE servers and storage, it is possible to use it alongside OpenShift to provide a holistic health view of the platform.

Being able to monitor to such an extent allows us to measure the performance of the live migration in our infrastructure in depth—which is the topic of our next section.

Performance of the live migration

As the number of VMs on the underlying host increases, the speed of the live migration gets impacted because of the constraints it has on the underlying resources. Luckily, there are multiple tuning parameters available that can be tweaked to achieve the most demanding migration deadlines.

The current parameters available are:

bandwidthPerMigration
completionTimeoutPerGiB
parallelMigrationsPerCluster
parallelOutboundMigrationsPerNode
progressTimeout

Tweaking these parameters while monitoring the OpenShift dashboards (including the exporters for the HPE tools) gives us the possibility to create the optimal settings for a migration drill. This is a task that HPE Services gladly performs as part of a live migration project.

To close this blog, we would like to provide a summary of the lessons learned from the technical validation of OpenShift virtualization on HPE infrastructure that was conducted in collaboration with Red Hat, which is covered in our next topic.

Learnings and lessons from joint technical validation with Red Hat

The following lessons were learned from the technical validation exercise:

Zero-downtime VM live migration in OpenShift is achievable, but we must exercise prudence

VMs were successfully migrated from one OpenShift node to another, including bulk migrations of up to 10 VMs simultaneously. The migration process involved the creation of a YAML file “VirtualMachineMigration.yaml” with the manifests that run that migration. Careful evaluation should be done when utilizing live migration, and the exercise should be postponed if there are currently any patches and/or upgrades being applied to the OpenShift cluster.

Figure 3 is a screenshot that shows the 10 VMs moved across workers 1, 2, and 3 as specified in the YAML manifest.

Figure 3. VM allocation by hypervisor in the cluster

Downtime will take place if self-node remediation is triggered

Node failure scenarios were also tested using self-node remediation (SNR) and fence agent remediation (FNR). When an OpenShift node was triggered to enter a not ready state, the VM successfully migrated to another node. However, in such scenarios, the VM underwent a restart during the migration process. The total time for node remediation was 6 minutes with SNR and 8 minutes with FAR, both using a 60-second unhealthy state YAML configuration. The live migration setup and configuration are flexible, and they need to be reviewed for optimal performance based on the objectives for the setup. Customers will find this information very useful when considering the move of mission-critical workloads and the need to plan for a downtime window as a contingency in case of unforeseen failures.

When it comes to high availability (HA) capabilities, both VMware and Red Hat hypervisors meet the expectations of mission-critical workloads

In terms of the HA for VMs upon node failures, both platforms are able to fail over the workloads to another available hypervisor in the cluster. The downtime of the application during the failover would depend on the size of the VM. Testing with a VM of a similar size (100 MB) on both platforms achieved similar results for both vSphere and OpenShift Virtualization clusters, with both platforms being able to fail over the workload in less than 8 minutes.

OpenShift HA setup is flexible and can be utilized for certain VMware alternative scenarios. However, planning and evaluation should be done before considering OpenShift HA for different application workloads. HPE Services can advise on the best action path to follow based on the workloads that will run in the virtualization platform.

How HPE Services can assist customers in their virtualization journey with Kubernetes

HPE Services can help customers to seamlessly migrate enterprise workloads from virtualized environments to a KubeVirt-based platform. This can be achieved through comprehensive migration planning, strategic design, implementation, and post-migration operations based on validated best practices from Red Hat and HPE Services. It is always important to have contingency plans in place to address unforeseen circumstances during the migration.

HPE Cloud Native Computing Services – Container Adoption includes VM live migration services that enable seamless VM live migration within the OpenShift Virtualization ecosystem, allowing organizations to move running VMs across nodes without downtime. By leveraging OpenShift’s built-in live migration capabilities, businesses can help ensure HA, optimize resource utilization, and maintain workload continuity. Our approach reduces complexity and disruption, enabling enterprises to efficiently manage VMs on OpenShift while benefiting from a unified platform that supports both VMs and containerized workloads.

By adopting OpenShift Virtualization, organizations can achieve greater flexibility, streamline operations, and position themselves for future growth in a cloud-native environment.

To learn more:
read our HPE Cloud Native Computing Services – Container Adoption solution brief.

IT infrastructure modernization

Learn more about HPE Services.

Meet the Author:

Ramanathan Senthil Kumar, Chief Solution Architect, HPE

Senthil Kumar is a Chief Solution Architect and has been with HPE since 2022. He boasts extensive expertise in private & public clouds solutions and containerization platforms. He began his career in 2011 focusing on Linux and VMware technologies Over the years, he has contributed to technology organizations such as Intel, Wipro, Hexaware, Singtel, DBS and Govtech with technology experience in private and public cloud, VMC, AWS, K8s, OpenShift, CICD pipelines, monitoring and logging.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Workload modernization with KubeVirt: The next-generation virtualization solution

HPE_Experts

Author

Kudos