← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1794996] Re: _destroy_evacuated_instances fails and kills n-cpu startup if lazy-loading flavor on a deleted instance

 

Reviewed:  https://review.openstack.org/606122
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=05cd8d128211adbbfb3cf5d626034ccd0f75a452
Submitter: Zuul
Branch:    master

commit 05cd8d128211adbbfb3cf5d626034ccd0f75a452
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Fri Sep 28 11:18:14 2018 -0400

    Fix InstanceNotFound during _destroy_evacuated_instances
    
    The _destroy_evacuated_instances method on compute
    startup tries to cleanup guests on the hypervisor and
    allocations held against that compute node resource
    provider by evacuated instances, but doesn't take into
    account that those evacuated instances could have been
    deleted in the meantime which leads to a lazy-load
    InstanceNotFound error that kills the startup of the
    compute service.
    
    This change does two things in the _destroy_evacuated_instances
    method:
    
    1. Loads the evacuated instances with a read_deleted='yes'
       context when calling _get_instances_on_driver(). This
       should be fine since _get_instances_on_driver() is already
       returning deleted instances anyway (InstanceList.get_by_filters
       defaults to read deleted instances unless the filters tell
       it otherwise - which we don't in this case). This is needed
       so that things like driver.destroy() don't raise
       InstanceNotFound while lazy-loading fields on the instance.
    
    2. Skips the call to remove_allocation_from_compute() if the
       evacuated instance is already deleted. If the instance is
       already deleted, its allocations should have been cleaned
       up by its hosting compute service (or the API).
    
    The functional regression test is updated to show the bug is
    now fixed.
    
    Change-Id: I1f4b3540dd453650f94333b36d7504ba164192f7
    Closes-Bug: #1794996


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1794996

Title:
  _destroy_evacuated_instances fails and kills n-cpu startup if lazy-
  loading flavor on a deleted instance

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged

Bug description:
  Seen here:

  http://logs.openstack.org/00/604400/5/check/nova-live-
  migration/6aa7a4b/logs/subnode-2/screen-n-cpu.txt.gz#_Sep_26_23_53_06_475005

  Sep 26 23:53:06.475005 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: DEBUG nova.objects.instance [None req-0b213f62-b666-4a02-b465-7f36d24e2fbf None None] Lazy-loading 'flavor' on Instance uuid 1d4eaa12-1477-4528-a06b-52e5672c6c61 {{(pid=31767) obj_load_attr /opt/stack/new/nova/nova/objects/instance.py:1109}}
  Sep 26 23:53:06.547498 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service [None req-0b213f62-b666-4a02-b465-7f36d24e2fbf None None] Error starting thread.: InstanceNotFound_Remote: Instance 1d4eaa12-1477-4528-a06b-52e5672c6c61 could not be found.
  Sep 26 23:53:06.547786 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: Traceback (most recent call last):
  Sep 26 23:53:06.548012 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:   File "/opt/stack/new/nova/nova/conductor/manager.py", line 126, in _object_dispatch
  Sep 26 23:53:06.548241 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:     return getattr(target, method)(*args, **kwargs)
  Sep 26 23:53:06.548456 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
  Sep 26 23:53:06.548670 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:     result = fn(cls, context, *args, **kwargs)
  Sep 26 23:53:06.548881 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:   File "/opt/stack/new/nova/nova/objects/instance.py", line 503, in get_by_uuid
  Sep 26 23:53:06.549100 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:     use_slave=use_slave)
  Sep 26 23:53:06.549319 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:   File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 210, in wrapper
  Sep 26 23:53:06.549530 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:     return f(*args, **kwargs)
  Sep 26 23:53:06.549743 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:   File "/opt/stack/new/nova/nova/objects/instance.py", line 495, in _db_instance_get_by_uuid
  Sep 26 23:53:06.549991 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:     columns_to_join=columns_to_join)
  Sep 26 23:53:06.550272 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:   File "/opt/stack/new/nova/nova/db/api.py", line 758, in instance_get_by_uuid
  Sep 26 23:53:06.550515 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:     return IMPL.instance_get_by_uuid(context, uuid, columns_to_join)
  Sep 26 23:53:06.551049 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:   File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 168, in wrapper
  Sep 26 23:53:06.551268 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:     return f(*args, **kwargs)
  Sep 26 23:53:06.551480 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:   File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 255, in wrapped
  Sep 26 23:53:06.551696 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:     return f(context, *args, **kwargs)
  Sep 26 23:53:06.551908 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:   File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 1843, in instance_get_by_uuid
  Sep 26 23:53:06.552125 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:     columns_to_join=columns_to_join)
  Sep 26 23:53:06.552344 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:   File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 1852, in _instance_get_by_uuid
  Sep 26 23:53:06.552561 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]:     raise exception.InstanceNotFound(instance_id=uuid)
  Sep 26 23:53:06.552772 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: InstanceNotFound: Instance 1d4eaa12-1477-4528-a06b-52e5672c6c61 could not be found.
  Sep 26 23:53:06.552989 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service Traceback (most recent call last):
  Sep 26 23:53:06.553210 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/usr/local/lib/python2.7/dist-packages/oslo_service/service.py", line 796, in run_service
  Sep 26 23:53:06.553420 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     service.start()
  Sep 26 23:53:06.553628 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/opt/stack/new/nova/nova/service.py", line 162, in start
  Sep 26 23:53:06.553878 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     self.manager.init_host()
  Sep 26 23:53:06.554108 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/opt/stack/new/nova/nova/compute/manager.py", line 1204, in init_host
  Sep 26 23:53:06.554318 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     evacuated_instances = self._destroy_evacuated_instances(context)
  Sep 26 23:53:06.554527 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/opt/stack/new/nova/nova/compute/manager.py", line 710, in _destroy_evacuated_instances
  Sep 26 23:53:06.554741 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     context, instance, cn_uuid, self.reportclient):
  Sep 26 23:53:06.554950 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/opt/stack/new/nova/nova/scheduler/utils.py", line 998, in remove_allocation_from_compute
  Sep 26 23:53:06.555159 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     flavor = instance.flavor
  Sep 26 23:53:06.555383 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 67, in getter
  Sep 26 23:53:06.555665 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     self.obj_load_attr(name)
  Sep 26 23:53:06.555876 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/opt/stack/new/nova/nova/objects/instance.py", line 1139, in obj_load_attr
  Sep 26 23:53:06.556086 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     self._load_flavor()
  Sep 26 23:53:06.556883 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/opt/stack/new/nova/nova/objects/instance.py", line 961, in _load_flavor
  Sep 26 23:53:06.557104 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     expected_attrs=['flavor'])
  Sep 26 23:53:06.557321 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 177, in wrapper
  Sep 26 23:53:06.557531 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     args, kwargs)
  Sep 26 23:53:06.557741 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/opt/stack/new/nova/nova/conductor/rpcapi.py", line 241, in object_class_action_versions
  Sep 26 23:53:06.557979 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     args=args, kwargs=kwargs)
  Sep 26 23:53:06.558191 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 179, in call
  Sep 26 23:53:06.558409 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     retry=self.retry)
  Sep 26 23:53:06.558638 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 133, in _send
  Sep 26 23:53:06.558853 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     retry=retry)
  Sep 26 23:53:06.559062 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 584, in send
  Sep 26 23:53:06.559271 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     call_monitor_timeout, retry=retry)
  Sep 26 23:53:06.559485 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 575, in _send
  Sep 26 23:53:06.559707 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service     raise result
  Sep 26 23:53:06.559918 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service InstanceNotFound_Remote: Instance 1d4eaa12-1477-4528-a06b-52e5672c6c61 could not be found.

  This is a test for evacuate where we take down nova-compute on one
  host and evacuate instances from it. Then those instances are deleted
  and then the compute service is re-enabled and restarted. On restart,
  nova-compute is trying to cleanup allocations in placement for the
  evacuated instances, and to do that it tries to lazy-load the flavor
  from the deleted instance which fails because we're not using
  read_deleted='yes' on the context.

  This is similar to bug 1745977 which was fixed with change:

  https://review.openstack.org/#/q/Ide6cc5bb1fce2c9aea9fa3efdf940e8308cd9ed0

  But that only handled loading of generic attributes, in that case
  system_metadata.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1794996/+subscriptions


References