yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #76080
[Bug 1794996] Re: _destroy_evacuated_instances fails and kills n-cpu startup if lazy-loading flavor on a deleted instance
Reviewed: https://review.openstack.org/606122
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=05cd8d128211adbbfb3cf5d626034ccd0f75a452
Submitter: Zuul
Branch: master
commit 05cd8d128211adbbfb3cf5d626034ccd0f75a452
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Fri Sep 28 11:18:14 2018 -0400
Fix InstanceNotFound during _destroy_evacuated_instances
The _destroy_evacuated_instances method on compute
startup tries to cleanup guests on the hypervisor and
allocations held against that compute node resource
provider by evacuated instances, but doesn't take into
account that those evacuated instances could have been
deleted in the meantime which leads to a lazy-load
InstanceNotFound error that kills the startup of the
compute service.
This change does two things in the _destroy_evacuated_instances
method:
1. Loads the evacuated instances with a read_deleted='yes'
context when calling _get_instances_on_driver(). This
should be fine since _get_instances_on_driver() is already
returning deleted instances anyway (InstanceList.get_by_filters
defaults to read deleted instances unless the filters tell
it otherwise - which we don't in this case). This is needed
so that things like driver.destroy() don't raise
InstanceNotFound while lazy-loading fields on the instance.
2. Skips the call to remove_allocation_from_compute() if the
evacuated instance is already deleted. If the instance is
already deleted, its allocations should have been cleaned
up by its hosting compute service (or the API).
The functional regression test is updated to show the bug is
now fixed.
Change-Id: I1f4b3540dd453650f94333b36d7504ba164192f7
Closes-Bug: #1794996
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1794996
Title:
_destroy_evacuated_instances fails and kills n-cpu startup if lazy-
loading flavor on a deleted instance
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) pike series:
Triaged
Status in OpenStack Compute (nova) queens series:
Triaged
Status in OpenStack Compute (nova) rocky series:
Triaged
Bug description:
Seen here:
http://logs.openstack.org/00/604400/5/check/nova-live-
migration/6aa7a4b/logs/subnode-2/screen-n-cpu.txt.gz#_Sep_26_23_53_06_475005
Sep 26 23:53:06.475005 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: DEBUG nova.objects.instance [None req-0b213f62-b666-4a02-b465-7f36d24e2fbf None None] Lazy-loading 'flavor' on Instance uuid 1d4eaa12-1477-4528-a06b-52e5672c6c61 {{(pid=31767) obj_load_attr /opt/stack/new/nova/nova/objects/instance.py:1109}}
Sep 26 23:53:06.547498 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service [None req-0b213f62-b666-4a02-b465-7f36d24e2fbf None None] Error starting thread.: InstanceNotFound_Remote: Instance 1d4eaa12-1477-4528-a06b-52e5672c6c61 could not be found.
Sep 26 23:53:06.547786 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: Traceback (most recent call last):
Sep 26 23:53:06.548012 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/conductor/manager.py", line 126, in _object_dispatch
Sep 26 23:53:06.548241 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: return getattr(target, method)(*args, **kwargs)
Sep 26 23:53:06.548456 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
Sep 26 23:53:06.548670 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: result = fn(cls, context, *args, **kwargs)
Sep 26 23:53:06.548881 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/objects/instance.py", line 503, in get_by_uuid
Sep 26 23:53:06.549100 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: use_slave=use_slave)
Sep 26 23:53:06.549319 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 210, in wrapper
Sep 26 23:53:06.549530 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: return f(*args, **kwargs)
Sep 26 23:53:06.549743 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/objects/instance.py", line 495, in _db_instance_get_by_uuid
Sep 26 23:53:06.549991 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: columns_to_join=columns_to_join)
Sep 26 23:53:06.550272 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/db/api.py", line 758, in instance_get_by_uuid
Sep 26 23:53:06.550515 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: return IMPL.instance_get_by_uuid(context, uuid, columns_to_join)
Sep 26 23:53:06.551049 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 168, in wrapper
Sep 26 23:53:06.551268 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: return f(*args, **kwargs)
Sep 26 23:53:06.551480 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 255, in wrapped
Sep 26 23:53:06.551696 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: return f(context, *args, **kwargs)
Sep 26 23:53:06.551908 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 1843, in instance_get_by_uuid
Sep 26 23:53:06.552125 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: columns_to_join=columns_to_join)
Sep 26 23:53:06.552344 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 1852, in _instance_get_by_uuid
Sep 26 23:53:06.552561 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: raise exception.InstanceNotFound(instance_id=uuid)
Sep 26 23:53:06.552772 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: InstanceNotFound: Instance 1d4eaa12-1477-4528-a06b-52e5672c6c61 could not be found.
Sep 26 23:53:06.552989 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service Traceback (most recent call last):
Sep 26 23:53:06.553210 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_service/service.py", line 796, in run_service
Sep 26 23:53:06.553420 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service service.start()
Sep 26 23:53:06.553628 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/service.py", line 162, in start
Sep 26 23:53:06.553878 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service self.manager.init_host()
Sep 26 23:53:06.554108 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/compute/manager.py", line 1204, in init_host
Sep 26 23:53:06.554318 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service evacuated_instances = self._destroy_evacuated_instances(context)
Sep 26 23:53:06.554527 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/compute/manager.py", line 710, in _destroy_evacuated_instances
Sep 26 23:53:06.554741 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service context, instance, cn_uuid, self.reportclient):
Sep 26 23:53:06.554950 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/scheduler/utils.py", line 998, in remove_allocation_from_compute
Sep 26 23:53:06.555159 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service flavor = instance.flavor
Sep 26 23:53:06.555383 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 67, in getter
Sep 26 23:53:06.555665 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service self.obj_load_attr(name)
Sep 26 23:53:06.555876 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/objects/instance.py", line 1139, in obj_load_attr
Sep 26 23:53:06.556086 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service self._load_flavor()
Sep 26 23:53:06.556883 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/objects/instance.py", line 961, in _load_flavor
Sep 26 23:53:06.557104 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service expected_attrs=['flavor'])
Sep 26 23:53:06.557321 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 177, in wrapper
Sep 26 23:53:06.557531 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service args, kwargs)
Sep 26 23:53:06.557741 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/conductor/rpcapi.py", line 241, in object_class_action_versions
Sep 26 23:53:06.557979 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service args=args, kwargs=kwargs)
Sep 26 23:53:06.558191 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 179, in call
Sep 26 23:53:06.558409 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service retry=self.retry)
Sep 26 23:53:06.558638 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 133, in _send
Sep 26 23:53:06.558853 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service retry=retry)
Sep 26 23:53:06.559062 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 584, in send
Sep 26 23:53:06.559271 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service call_monitor_timeout, retry=retry)
Sep 26 23:53:06.559485 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 575, in _send
Sep 26 23:53:06.559707 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service raise result
Sep 26 23:53:06.559918 ubuntu-xenial-rax-ord-0002347061 nova-compute[31767]: ERROR oslo_service.service InstanceNotFound_Remote: Instance 1d4eaa12-1477-4528-a06b-52e5672c6c61 could not be found.
This is a test for evacuate where we take down nova-compute on one
host and evacuate instances from it. Then those instances are deleted
and then the compute service is re-enabled and restarted. On restart,
nova-compute is trying to cleanup allocations in placement for the
evacuated instances, and to do that it tries to lazy-load the flavor
from the deleted instance which fails because we're not using
read_deleted='yes' on the context.
This is similar to bug 1745977 which was fixed with change:
https://review.openstack.org/#/q/Ide6cc5bb1fce2c9aea9fa3efdf940e8308cd9ed0
But that only handled loading of generic attributes, in that case
system_metadata.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1794996/+subscriptions
References