HPE Morpheus VM Essentials
1830938 Members
2157 Online
110017 Solutions
New Discussion

HCI cluster (Ceph) VMs start on multiple hosts simultaneously

 
dya
Frequent Advisor

HCI cluster (Ceph) VMs start on multiple hosts simultaneously

I have confirmed that a VM starts on multiple hosts at the same time.

This seems like a problem, but is it a glitch?

I'm not sure if this always happens, but I have confirmed it in the following cases.

-After putting the host into maintenance mode and migrating the guest to another host,
and rebooting the host.

-After causing a kernel panic on the host to test HA,
after confirming that the guest has started on the other host and booting the host that caused the kernel panic.

I think I'm using RBD, but I'm worried that the file system will be destroyed if the VM's file system is not a cluster-compatible file system.

*I apologize if this is hard to understand as it is machine translated.

7 REPLIES 7
DiegoDelgado
HPE Pro

Re: HCI cluster (Ceph) VMs start on multiple hosts simultaneously

I haven’t seen this behavior, but can you please the following?
- What VM Essentials version are you using?
- Did you configure Ceph when deploying the cluster from the manager?
- How is you network configured? Are your management and storage ports bonded?


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
dya
Frequent Advisor

Re: HCI cluster (Ceph) VMs start on multiple hosts simultaneously

Thank you.

- The version is "8.0.3".
- When creating the cluster, I selected "HPE VM 1.1 HCI Ceph Cluster on Existing Ubuntu 22.04" in the layout.
- When creating the cluster, I only entered "Management Net Interface" for the network, and did not enter "Storage Net Interface", "Compute Net Interface", or "Overlay Net Interface".
- When checking /etc/ceph/ceph.conf, the cluster network and public network are the same.
- cluster network = xxx.xxx.xxx.xxx/24
- public network = xxx.xxx.xxx.xxx/24
- After creating the cluster, I manually created an OVS bridge for the VM, which looks like this:
# ovs-vsctl show
19c43c1b-c751-4d2e-9e4f-957667fdfce1
Bridge mgmt
fail_mode: standalone
Port enp1s0
Interface enp1s0
Port mgmt
Interface mgmt
type: internal
Bridge br-vm
Port br-vm
Interface br-vm
type: internal
Port vnet3
Interface vnet3
Port enp8s0
Interface enp8s0
ovs_version: "2.17.9"
#

dya
Frequent Advisor

Re: HCI cluster (Ceph) VMs start on multiple hosts simultaneously

-After causing a kernel panic on the host to test HA,
after confirming that the guest has started on the other host and booting the host that caused the kernel panic.

The above case occurred again.

After checking the same VM on multiple hosts with virsh list, I ran the following command.

# rbd status hpevm_1-disk-0 -p mvm-volumes

Watchers:

watcher=xxx.xxx.xxx.133:0/684952890 client.2474309 cookie=129397164085856

watcher=xxx.xxx.xxx.132:0/2851887314 client.2475676 cookie=133929696760976

#

A cluster with three nodes: xxx.xxx.xxx.131, xxx.xxx.xxx.132, and xxx.xxx.xxx.133. In the test, it was xxx.xxx.xxx.131 that caused the kernel panic.

dya
Frequent Advisor

Re: HCI cluster (Ceph) VMs start on multiple hosts simultaneously

Additional information in mvm-hb when running simultaneously.

I'm guessing that the clue to finding the cause lies not in ceph but in the VME, which determines which host to start on, but I have no idea.

# ls -l /var/morpheus/kvm/images/mvm-hb/vme-ceph2
total 19
-rwxrwxr-- 1 morpheus-node morpheus-node 9306 Mar 24 21:22 ceph-rocky1.xml
-rwxrwxr-- 1 morpheus-node morpheus-node 8597 Mar 24 21:30 ceph-ubuntu.xml
-rwxrwxr-- 1 morpheus-node morpheus-node 283 Mar 24 21:34 hb.properties
root@vme-ceph1:~# ls -l /var/morpheus/kvm/images/mvm-hb/vme-ceph3
total 10
-rwxrwxr-- 1 morpheus-node morpheus-node 8932 Mar 24 21:33 ceph-ubuntu.xml
-rwxrwxr-- 1 morpheus-node morpheus-node 254 Mar 24 21:34 hb.properties
#

YA1007
Occasional Advisor

Re: HCI cluster (Ceph) VMs start on multiple hosts simultaneously

## Different account, but same person as the questioner

Additional information when mvm-hb is running simultaneously.

・virsh list
→ The same VM is running on two hosts.

・ls -l /var/morpheus/kvm/images/mvm-hb/vme-cephx
→ The xml file of the target VM is in the directory of the host that is running in virsh list.

・rbd status mvm-volumes/<image>
→ Same as what I reported before, but it is being accessed from two clients.

・rbd lock list mvm-volumes/<image>
→ New confirmation. The client that has acquired the "exclusive lock" changes every few seconds. I'm looking at "Address", and the addresses of the two hosts that are running at the same time are displayed alternately.

 

YA1007
Occasional Advisor

Re: HCI cluster (Ceph) VMs start on multiple hosts simultaneously

I have a Ceph cluster with 3 hosts, and this issue seems to occur in the following cases.

1) After rebooting the OS after one machine has been in a failed state for several minutes during an HA test

2) When one machine has been stopped for several minutes

I don't know the details yet, but in both cases, the Ceph cluster with 3 hosts is operating with only 2 machines, which is the bare minimum required to maintain a majority.

The reproducibility is quite high. I would like to know if this is a bug or if it can be solved by changing the settings.

dya
Frequent Advisor

Re: HCI cluster (Ceph) VMs start on multiple hosts simultaneously

Do you understand this?