yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1952745] [NEW] Evacuated instances should be completed when ComputeHostNotFound

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Konrad Cempura <1952745@xxxxxxxxxxxxxxxxxx>
Date: Tue, 30 Nov 2021 11:59:29 -0000
Reply-to: Bug 1952745 <1952745@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx
Public bug reported:

Scenario 1:
- remove compute physically by format disk
- evacuate VMs from removed compute
- remove orphaned resource provider for removed compute
- add new compute with the same name as removed one
- migrate evacuated VMs on new compute

Expected result
===============
VMs are working correctly.

Actual result
=============
Definition of VMs are removed from libvirt.


Scenario 2:
- remove compute physically by format disk
- evacuate VMs from removed compute
- remove orphaned resource provider for removed compute
- add new compute with the same name as removed one
- restart nova_compute


Expected result
===============
Evacuations are completed on first run of nova_compute.

Actual result
=============
Evacuations are completed after restart.


Scenario 3:
- remove compute physically by format disk
- evacuate VMs from removed compute
- add new compute with the same name as removed one but using capital letters
- migrate evacuated VMs on new compute

Expected result
===============
VMs are working correctly on new compute.

Actual result
=============
Definitions of VMs are removed from libvirt.


Environment
===========
1. Openstack Train
Commit-Id: 4cf72ea6bfc58d33da894f248184c08c36055884
Also occurs on master.

2. Libvirt + KVM
libvirtd (libvirt) 4.5.0


Proposed solution
=================

Evacuations should be completed when ComputeHostNotFound occurs.


Proposed patch
==============

diff --git a/nova/compute/manager.py b/nova/compute/manager.py
index eaedc0238f..df56430aa6 100644
--- a/nova/compute/manager.py
+++ b/nova/compute/manager.py
@@ -752,16 +752,16 @@ class ComputeManager(manager.Manager):
                         context, self.host, migration.source_node).uuid
                     compute_nodes[migration.source_node] = cn_uuid
                 except exception.ComputeHostNotFound:
-                    LOG.error("Failed to clean allocation of evacuated "
-                              "instance as the source node %s is not found",
-                              migration.source_node, instance=instance)
-                    continue
-            cn_uuid = compute_nodes[migration.source_node]
+                    LOG.warning("Failed to clean allocation of evacuated "
+                                "instance as the source node %s is not found",
+                                migration.source_node, instance=instance)
+
+            cn_uuid = compute_nodes.get(migration.source_node)

             # If the instance was deleted in the interim, assume its
             # allocations were properly cleaned up (either by its hosting
             # compute service or the API).
-            if (not instance.deleted and
+            if (cn_uuid and not instance.deleted and
                     not self.reportclient.
                         remove_provider_tree_from_instance_allocation(
                             context, instance.uuid, cn_uuid)):

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1952745

Title:
  Evacuated instances should be completed when ComputeHostNotFound

Status in OpenStack Compute (nova):
  New

Bug description:
  Scenario 1:
  - remove compute physically by format disk
  - evacuate VMs from removed compute
  - remove orphaned resource provider for removed compute
  - add new compute with the same name as removed one
  - migrate evacuated VMs on new compute

  Expected result
  ===============
  VMs are working correctly.

  Actual result
  =============
  Definition of VMs are removed from libvirt.

  
  Scenario 2:
  - remove compute physically by format disk
  - evacuate VMs from removed compute
  - remove orphaned resource provider for removed compute
  - add new compute with the same name as removed one
  - restart nova_compute

  
  Expected result
  ===============
  Evacuations are completed on first run of nova_compute.

  Actual result
  =============
  Evacuations are completed after restart.

  
  Scenario 3:
  - remove compute physically by format disk
  - evacuate VMs from removed compute
  - add new compute with the same name as removed one but using capital letters
  - migrate evacuated VMs on new compute

  Expected result
  ===============
  VMs are working correctly on new compute.

  Actual result
  =============
  Definitions of VMs are removed from libvirt.

  
  Environment
  ===========
  1. Openstack Train
  Commit-Id: 4cf72ea6bfc58d33da894f248184c08c36055884
  Also occurs on master.

  2. Libvirt + KVM
  libvirtd (libvirt) 4.5.0

  
  Proposed solution
  =================

  Evacuations should be completed when ComputeHostNotFound occurs.

  
  Proposed patch
  ==============

  diff --git a/nova/compute/manager.py b/nova/compute/manager.py
  index eaedc0238f..df56430aa6 100644
  --- a/nova/compute/manager.py
  +++ b/nova/compute/manager.py
  @@ -752,16 +752,16 @@ class ComputeManager(manager.Manager):
                           context, self.host, migration.source_node).uuid
                       compute_nodes[migration.source_node] = cn_uuid
                   except exception.ComputeHostNotFound:
  -                    LOG.error("Failed to clean allocation of evacuated "
  -                              "instance as the source node %s is not found",
  -                              migration.source_node, instance=instance)
  -                    continue
  -            cn_uuid = compute_nodes[migration.source_node]
  +                    LOG.warning("Failed to clean allocation of evacuated "
  +                                "instance as the source node %s is not found",
  +                                migration.source_node, instance=instance)
  +
  +            cn_uuid = compute_nodes.get(migration.source_node)

               # If the instance was deleted in the interim, assume its
               # allocations were properly cleaned up (either by its hosting
               # compute service or the API).
  -            if (not instance.deleted and
  +            if (cn_uuid and not instance.deleted and
                       not self.reportclient.
                           remove_provider_tree_from_instance_allocation(
                               context, instance.uuid, cn_uuid)):

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1952745/+subscriptions