yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #68613
[Bug 1721652] Re: Evacuate cleanup fails at _delete_allocation_for_moved_instance
Reviewed: https://review.openstack.org/510938
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9252ffdacf262008bc41409d4fb574ec472dc913
Submitter: Zuul
Branch: master
commit 9252ffdacf262008bc41409d4fb574ec472dc913
Author: Balazs Gibizer <balazs.gibizer@xxxxxxxxxxxx>
Date: Thu Oct 12 16:07:28 2017 +0200
fix cleaning up evacuated instances
When bug 1709902 was fixed in I0df401a7c91f012fdb25cb0e6b344ca51de8c309
the fix assumed that when the _destroy_evacuated_instances() is called
during the init of the nova-compute service the resource tracker
already knows the compute node ids associated to the given compute
host. This is not true and therefore _destroy_evacuated_instances fails
with and exception and does not clean up the evacuated instance.
The resource tracker's compute_nodes dict only initalized during the
first update_available_resource call that happens in the
pre_start_hook. While the _destroy_evacuate_instances called from
init_host which is called before the pre_hook_start.
The _destroy_evacuated_instances call uses the
_delete_allocation_for_moved_instance that relies on the resource
tracker's compute_nodes dict.
This patch inlines _delete_allocation_for_moved_instance in
_destroy_evacuated_instances and queries the db for the compute node
uuid. As ironic uses 1:M host:node setup we cannot ask the db only once
about the node uuid as different instances might be on different nodes.
Change-Id: I35749374ff09b0e98064c75ff9c33dad577579c6
Closes-Bug: #1721652
Related-Bug: #1709902
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1721652
Title:
Evacuate cleanup fails at _delete_allocation_for_moved_instance
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) pike series:
In Progress
Bug description:
Description
===========
After an evacuation, when nova-compute is restarted on the source host, the clean up of the old instance on the source host fails. The traceback in nova-compute.log ends with:
2017-10-04 05:32:18.725 5575 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 679, in _destroy_evacuated_instances
2017-10-04 05:32:18.725 5575 ERROR oslo_service.service instance, migration.source_node)
2017-10-04 05:32:18.725 5575 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 1216, in delete_allocation_for_evacuated_instance
2017-10-04 05:32:18.725 5575 ERROR oslo_service.service instance, node, 'evacuated', node_type)
2017-10-04 05:32:18.725 5575 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 1227, in _delete_allocation_for_moved_instance
2017-10-04 05:32:18.725 5575 ERROR oslo_service.service cn_uuid = self.compute_nodes[node].uuid
2017-10-04 05:32:18.725 5575 ERROR oslo_service.service KeyError: u'<SOURCE_HOST_NAME>'
2017-10-04 05:32:18.725 5575 ERROR oslo_service.service
Steps to reproduce
==================
Deploy instance on Host A.
Shut down Host A.
Evacuate instance to Host B.
Turn back on Host A.
Wait for cleanup of old instance allocation to occur
Expected result
===============
Clean up of old instance from Host A is successful
Actual result
=============
Old instance clean up appears to work but there's a traceback in the log and allocation is not cleaned up.
Environment
===========
(pike)nova-compute/now 10:16.0.0-201710030907
Additional Info:
================
Problem seems to come from this change: https://github.com/openstack/nova/commit/0de806684f5d670dd5f961f7adf212961da3ed87 at:
rt = self._get_resource_tracker()
rt.delete_allocation_for_evacuated_instance
That is called very early in init_host flow to clean up the allocations. The problem is that at this point in the startup the resource tracker's self.compute_node is still None. That makes delete_allocation_for_evacuated_instance blow up with a key error at:
cn_uuid = self.compute_nodes[node].uuid
The resource tracker's self.compute_node is actually initialized later on in the startup process via the update_available_resources() -> _update_available_resources() -> _init_compute_node(). It isn't initialized when the tracker is first created which appears to be the assumption made by the referenced commit.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1721652/+subscriptions
References