← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2109457] [NEW] nova compute service won't start in a hyper-converged environment when there is an existing kvm unit on a nova compute node

 

Public bug reported:

When deploying a hyper-converged openstack 2024.1 using juju, nova
compute service won't start if a control plane service unit is also
deployed on the nova compute node as a KVM vm. It is OK if the control
plane unit is deployed as an LXD container.

The reason is nova compute will check if there is existing VM in the host [1].
If juju starts a VM (e.g. when there is "to kvm:X") and the VM is managed by juju, nova compute doesn't recognise it and fails the sanity check in [1].

A workaround is to bypass the check so nova compute can start for the
first time.

$ sudo cp -p /usr/lib/python3/dist-packages/nova/compute/manager.py /usr/lib/python3/dist-packages/nova/compute/manager.py.bak
$ sudo sed -i 's|if len(instances_on_hv) > 0|if len(instances_on_hv) > 999|g' /usr/lib/python3/dist-packages/nova/compute/manager.py
$ sudo systemctl restart nova-compute
$ sudo mv -f /usr/lib/python3/dist-packages/nova/compute/manager.py.bak /usr/lib/python3/dist-packages/nova/compute/manager.py

Also this check only runs for new nova compute services. If the nova
compute service has a record in the db, this check will not run again.
Thus there is no need to apply this workaround for existing nova compute
nodes afterwards.

Please check if the code [1] needs to be revised to run in a hyper-
converged environment.

[1]
https://github.com/openstack/nova/blob/stable/2024.1/nova/compute/manager.py#L1574

** Affects: nova
     Importance: Undecided
         Status: New

** Description changed:

  When deploying a hyper-converged openstack 2024.1 using juju, nova
  compute service won't start if a control plane service unit is also
  deployed on the nova compute node as a KVM vm. It is OK if the control
  plane unit is deployed as an LXD container.
  
  The reason is nova compute will check if there is existing VM in the host [1].
- If juju starts a VM (e.g. using "to kvm:X") and the VM is managed by juju, nova compute doesn't recognise it and fails the check in [1].
+ If juju starts a VM (e.g. when there is "to kvm:X") and the VM is managed by juju, nova compute doesn't recognise it and fails the check in [1].
  
  A workaround is to bypass the check so nova compute can start for the
  first time.
  
  $ sudo cp -p /usr/lib/python3/dist-packages/nova/compute/manager.py /usr/lib/python3/dist-packages/nova/compute/manager.py.bak
  $ sudo sed -i 's|if len(instances_on_hv) > 0|if len(instances_on_hv) > 999|g' /usr/lib/python3/dist-packages/nova/compute/manager.py
  $ sudo systemctl restart nova-compute
  $ sudo mv -f /usr/lib/python3/dist-packages/nova/compute/manager.py.bak /usr/lib/python3/dist-packages/nova/compute/manager.py
  
  Also this check only runs for new nova compute services. If the nova
  compute service has a record in the db, this check will not run again.
  
  Please check if the code [1] needs to be revised to run in a hyper-
  converged environment.
  
  [1]
  https://github.com/openstack/nova/blob/stable/2024.1/nova/compute/manager.py#L1574

** Description changed:

  When deploying a hyper-converged openstack 2024.1 using juju, nova
  compute service won't start if a control plane service unit is also
  deployed on the nova compute node as a KVM vm. It is OK if the control
  plane unit is deployed as an LXD container.
  
  The reason is nova compute will check if there is existing VM in the host [1].
- If juju starts a VM (e.g. when there is "to kvm:X") and the VM is managed by juju, nova compute doesn't recognise it and fails the check in [1].
+ If juju starts a VM (e.g. when there is "to kvm:X") and the VM is managed by juju, nova compute doesn't recognise it and fails the sanity check in [1].
  
  A workaround is to bypass the check so nova compute can start for the
  first time.
  
  $ sudo cp -p /usr/lib/python3/dist-packages/nova/compute/manager.py /usr/lib/python3/dist-packages/nova/compute/manager.py.bak
  $ sudo sed -i 's|if len(instances_on_hv) > 0|if len(instances_on_hv) > 999|g' /usr/lib/python3/dist-packages/nova/compute/manager.py
  $ sudo systemctl restart nova-compute
  $ sudo mv -f /usr/lib/python3/dist-packages/nova/compute/manager.py.bak /usr/lib/python3/dist-packages/nova/compute/manager.py
  
  Also this check only runs for new nova compute services. If the nova
  compute service has a record in the db, this check will not run again.
  
  Please check if the code [1] needs to be revised to run in a hyper-
  converged environment.
  
  [1]
  https://github.com/openstack/nova/blob/stable/2024.1/nova/compute/manager.py#L1574

** Description changed:

  When deploying a hyper-converged openstack 2024.1 using juju, nova
  compute service won't start if a control plane service unit is also
  deployed on the nova compute node as a KVM vm. It is OK if the control
  plane unit is deployed as an LXD container.
  
  The reason is nova compute will check if there is existing VM in the host [1].
  If juju starts a VM (e.g. when there is "to kvm:X") and the VM is managed by juju, nova compute doesn't recognise it and fails the sanity check in [1].
  
  A workaround is to bypass the check so nova compute can start for the
  first time.
  
  $ sudo cp -p /usr/lib/python3/dist-packages/nova/compute/manager.py /usr/lib/python3/dist-packages/nova/compute/manager.py.bak
  $ sudo sed -i 's|if len(instances_on_hv) > 0|if len(instances_on_hv) > 999|g' /usr/lib/python3/dist-packages/nova/compute/manager.py
  $ sudo systemctl restart nova-compute
  $ sudo mv -f /usr/lib/python3/dist-packages/nova/compute/manager.py.bak /usr/lib/python3/dist-packages/nova/compute/manager.py
  
  Also this check only runs for new nova compute services. If the nova
  compute service has a record in the db, this check will not run again.
+ Thus there is no need to apply this workaround for existing nova compute
+ nodes afterwards.
  
  Please check if the code [1] needs to be revised to run in a hyper-
  converged environment.
  
  [1]
  https://github.com/openstack/nova/blob/stable/2024.1/nova/compute/manager.py#L1574

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2109457

Title:
  nova compute service won't start in a hyper-converged environment when
  there is an existing kvm unit on a nova compute node

Status in OpenStack Compute (nova):
  New

Bug description:
  When deploying a hyper-converged openstack 2024.1 using juju, nova
  compute service won't start if a control plane service unit is also
  deployed on the nova compute node as a KVM vm. It is OK if the control
  plane unit is deployed as an LXD container.

  The reason is nova compute will check if there is existing VM in the host [1].
  If juju starts a VM (e.g. when there is "to kvm:X") and the VM is managed by juju, nova compute doesn't recognise it and fails the sanity check in [1].

  A workaround is to bypass the check so nova compute can start for the
  first time.

  $ sudo cp -p /usr/lib/python3/dist-packages/nova/compute/manager.py /usr/lib/python3/dist-packages/nova/compute/manager.py.bak
  $ sudo sed -i 's|if len(instances_on_hv) > 0|if len(instances_on_hv) > 999|g' /usr/lib/python3/dist-packages/nova/compute/manager.py
  $ sudo systemctl restart nova-compute
  $ sudo mv -f /usr/lib/python3/dist-packages/nova/compute/manager.py.bak /usr/lib/python3/dist-packages/nova/compute/manager.py

  Also this check only runs for new nova compute services. If the nova
  compute service has a record in the db, this check will not run again.
  Thus there is no need to apply this workaround for existing nova
  compute nodes afterwards.

  Please check if the code [1] needs to be revised to run in a hyper-
  converged environment.

  [1]
  https://github.com/openstack/nova/blob/stable/2024.1/nova/compute/manager.py#L1574

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2109457/+subscriptions