Greetings all,
Have SimpliVity system on top of VMware OS.
Recently had an outage and powered off all VMs manually, including vCLS, instead of migrating the vCLS to the last host to be powered off. After host systems power on, manual reboot of vCLS did not re-enable DRS services as shown by error:
"vSphere Cluster Service VM is required to maintain the health of vSphere Cluster Services. Power state and resource of this VM is managed by vSphere Cluster Services."
Read documentation somewhere to attempt manual deletion of vCLS VMs to allow them to reboot on their own. I tried one to see the affect but this resulted in a vCLS VM that deletes but now displays "orphaned" but is present in list of inventory. Thought better not to go through with other two vCLS VMs. Getting the following error with "Failed to relocate vCLS" event message of the orphaned vCLS VM that was deleted manually from host.
Unable to perform any meaningful actions on orphaned VM. Have tried disabling DRS via Retreat Mode to reboot all vCLS VMs to no avail. It is essentially a ghost artifact displaying on my vCenter web client as I am unable to locate it in the data stores on the hosts themselves. The remaining two vCLS are live but DRS still unhealthy. I would really appreciate any insight on remediation the community here can provide. Thanks.
Solved! Go to Solution.
vSphere Cluster Sevices (vCLS) rely on dedicated virtual machines (vCLS VMs) to function. vCenter server automatically manages the power state & resources allocated to vCLS VMs. Manually manipulating vCLS VMs is not a recommended practice.
Unfortunately manual deletion seems to have caused this issue "Failed to relocate vCLS". You need to remove the "ghost" vCLS VM from the inventory.
Focus on cleaning up the orphaned VM, don't touch the remaining healthy vCLS VMs and ensure the healthy vCLS operation. Don't attempt to relocate or power on the orphaned vCLS VM as it is in a broken state & wont function properly. And as you mentioned 2 remaining vCLS VMs are live, it is important to check their health as well. By doing this, the error message about the power state and resource management may disappear and DRS should function normally.
There is no way to disable vCLS on a vSphere cluster and still have vSphere DRS being functional on that cluster.
Disabling vCLS on a cluster can be done by the Retreat Mode, but this will impact some of the cluster services, such DRS, for that cluster. The VMs running inside your cluster are not load-balanced and will not be migrated to different hosts if your host running a particular VM is running out of resources.
Here are some additional resources which talks about vCLS that you might find helpful:
https://kb.vmware.com/s/article/80472
https://kb.vmware.com/s/article/91890
https://kb.vmware.com/s/article/91891
To investigate further and resolve the issue, I recommend reaching out to HPE SimpliVity support as they can assist in troubleshooting and addressing this problem.
Hope this works for you.
Regards,
Sanika.
If you feel this was helpful, please click the KUDOS thumb below. Also consider marking this as an "Accepted Solution", if the post has helped to solve your issue.
Thank you for the info @Sanika ,
Unfortunately this whole ordeal started because I, which I in retrospect realized was erroneous on my part, manually shutdown the vCLS VMs during the outage. I had to manually power them on but that did not do the trick to get rid of vSphere Cluster Service VM message, and thus why I tried to remove one of them manually with the hopes that would trigger a respawn, to no avail. I suspect the other manually powered on vCLS I haven't touched are not healthy either as they show powered on but in Monitor, under Task of Tasks and Events menu, powerer off task displays but not power on indicating I powered them on. However, I do see "bandwidth usages is normal" information update under Events. Could the below be the source of the issue? Have you ever seen situation and/or any ideas on best way to resolve?
could the below be the case and if so how to rectify? --> https://kb.vmware.com/s/article/79892
Scenarios with a resolution where vCLS VMs power on may fail
Hi @learninnlivin ,
I have reviewed the document that you provided. The workaround/cause looks similar to your scenario. Workaround as mentioned earlier would be to clean the orphaned vCLS VMs from the inventory.
According to me, the other manually powered on vCLS that were up, but you assume weren't healthy, can be checked if there are enough free resources in the cluster which may cause the issue. But I think we cannot judge the scenario of their health and availability just by the details provided above.
However, I would still recommend opening a support case with HPE Support, where they will gather all the necessary details and assist you accordingly.
Regards,
Sanika.
If you feel this was helpful, please click the KUDOS thumb below. Also consider marking this as an "Accepted Solution", if the post has helped to solve your issue.
Hi @Sanika ,
Thank you for the reply. Unfortunately I can't take this with HPE support as the support lapsed recently. This is why I am trying my luck here.
I definitely understand what you are saying that I should clean up the orphaned VM. But unfortunately, I am at a loss as to how to do that since it was deleted from the host but it still shows up on vCenter web client and displays (orphaned) in parentheses. That's what makes this odd. And I have tried doing retreat mode but that did not restart the vCLS VMs. How can I clear the orphaned vCLS VM from portal? Could a VCSA restart do the trick? Is there something else I can pursue?
Thanks
@learninnlivin To remove the orphaned VM from inventory simply right click on it on vCenter and choose "Remove from Inventory"
Vmware will spawn a new vCLS VM when one gets deleted, this is done automatically. Also, when vCLS VMS is shutdown, vCenter will auto power on those as well. Shutting down the vCLS is ok if you have to set the host to Maintenance Mode for example.
You can't do nothing with that orphaned VM so as Sanika mentioned your best bet is to remove it from inventory and let Vmware create a new one.
Hi there @gustenar ,
Unfortunately that option is off the table as every action of significance is greyed out. And there's no trace of anything else related to vCLS that I could take action on (like a .vmdx file or something).
Thank you for the help @Sanika and @gustenar .
I was able to resolve this myself after some more digging and following the error messages further using the below sites as guidance. Thank you both for your efforts and help!
https://buildingtents.com/2023/06/06/vmware-eam-failing-and-not-allowing-upgrades/
https://kb.vmware.com/s/article/2112577#update_extension_certificate_on_vcenter_server_appliance
That's good to know!
Happy to help in any way I can.
Regards,
Sanika.