yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #67095
[Bug 1713783] Re: After failed evacuation the recovered source compute tries to delete the instance
It looks like the _destroy_evacuated_instances method in the compute
manager has always filtered migrations on the 'accepted' status since
originally this code was just meant to cleanup local resources once an
evacuation from the source host has started, which is fine. The problem
is with removing the source node allocations if the evacuation failed,
but if it failed in the conductor, we can fix that here:
https://review.openstack.org/#/c/499237/
If it failed in the destination compute service, the migration status
should be set to 'failed' and the migration filter in
_destroy_evacuated_instances would filter it out.
** Also affects: nova/ocata
Importance: Undecided
Status: New
** Also affects: nova/pike
Importance: Undecided
Status: New
** Changed in: nova
Importance: Undecided => High
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1713783
Title:
After failed evacuation the recovered source compute tries to delete
the instance
Status in OpenStack Compute (nova):
Triaged
Status in OpenStack Compute (nova) newton series:
Triaged
Status in OpenStack Compute (nova) ocata series:
Triaged
Status in OpenStack Compute (nova) pike series:
Triaged
Bug description:
Description
===========
In case of a failed evacuation attempt the status of the migration is 'accepted' instead of 'failed' so when source compute is recovered the compute manager tries to delete the instance from the source host. However a secondary fault prevents deleting the allocation in placement so the actual deletion of the instance fails as well.
Steps to reproduce
==================
The following functional test reproduces the bug: https://review.openstack.org/#/c/498482/
What it does: initiate evacuation when no valid host is available and evacuation fails, but nova manager still tries to delete the instance.
Logs:
2017-08-29 19:11:15,751 ERROR [oslo_messaging.rpc.server] Exception during message handling
NoValidHost: No valid host was found. There are not enough hosts available.
2017-08-29 19:11:16,103 INFO [nova.tests.functional.test_servers] Running periodic for compute1 (host1)
2017-08-29 19:11:16,115 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/aggregates" status: 200 len: 18 microversion: 1.1
2017-08-29 19:11:16,120 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/inventories" status: 200 len: 401 microversion: 1.0
2017-08-29 19:11:16,131 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/allocations" status: 200 len: 152 microversion: 1.0
2017-08-29 19:11:16,138 INFO [nova.compute.resource_tracker] Final resource view: name=host1 phys_ram=8192MB used_ram=1024MB phys_disk=1028GB used_disk=1GB total_vcpus=10 used_vcpus=1 pci_stats=[]
2017-08-29 19:11:16,146 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/aggregates" status: 200 len: 18 microversion: 1.1
2017-08-29 19:11:16,151 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/inventories" status: 200 len: 401 microversion: 1.0
2017-08-29 19:11:16,152 INFO [nova.tests.functional.test_servers] Running periodic for compute2 (host2)
2017-08-29 19:11:16,163 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/aggregates" status: 200 len: 18 microversion: 1.1
2017-08-29 19:11:16,168 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/inventories" status: 200 len: 401 microversion: 1.0
2017-08-29 19:11:16,176 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/allocations" status: 200 len: 54 microversion: 1.0
2017-08-29 19:11:16,184 INFO [nova.compute.resource_tracker] Final resource view: name=host2 phys_ram=8192MB used_ram=512MB phys_disk=1028GB used_disk=0GB total_vcpus=10 used_vcpus=0 pci_stats=[]
2017-08-29 19:11:16,192 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/aggregates" status: 200 len: 18 microversion: 1.1
2017-08-29 19:11:16,197 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/inventories" status: 200 len: 401 microversion: 1.0
2017-08-29 19:11:16,198 INFO [nova.tests.functional.test_servers] Finished with periodics
2017-08-29 19:11:16,255 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/6f70656e737461636b20342065766572/servers/5058200c-478e-4449-88c1-906fdd572662" status: 200 len: 1875 microversion: 2.53 time: 0.056198
2017-08-29 19:11:16,262 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/6f70656e737461636b20342065766572/os-migrations" status: 200 len: 373 microversion: 2.53 time: 0.004618
2017-08-29 19:11:16,280 INFO [nova.api.openstack.requestlog] 127.0.0.1 "PUT /v2.1/6f70656e737461636b20342065766572/os-services/c269bc74-4720-4de4-a6e5-889080b892a0" status: 200 len: 245 microversion: 2.53 time: 0.016442
2017-08-29 19:11:16,281 INFO [nova.service] Starting compute node (version 16.0.0)
2017-08-29 19:11:16,296 INFO [nova.compute.manager] Deleting instance as it has been evacuated from this host
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1713783/+subscriptions
References