← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1713783] Re: After failed evacuation the recovered source compute tries to delete the instance

 

Reviewed:  https://review.openstack.org/499237
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a8ebf5f1aac080854704e27146e8c98b053c6224
Submitter: Jenkins
Branch:    master

commit a8ebf5f1aac080854704e27146e8c98b053c6224
Author: Előd Illés <elod.illes@xxxxxxxxxxxx>
Date:   Wed Aug 30 16:54:36 2017 +0200

    Set error state after failed evacuation
    
    When evacuation fails with NoValidHost, the migration status remains
    'accepted' instead of 'error'. This causes problem in case the compute
    service starts up again and looks for evacuations with status 'accepted',
    as it then removes the local instances for those evacuations even though
    the instance was never actually evacuated to another host.
    
    Change-Id: I06d78c744fa75ae5f34c5cfa76bc3c9460767b84
    Closes-Bug: #1713783


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1713783

Title:
  After failed evacuation the recovered source compute tries to delete
  the instance

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) newton series:
  Triaged
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Security Advisory:
  Won't Fix

Bug description:
  Description
  ===========
  In case of a failed evacuation attempt the status of the migration is 'accepted' instead of 'failed' so when source compute is recovered the compute manager tries to delete the instance from the source host. However a secondary fault prevents deleting the allocation in placement so the actual deletion of the instance fails as well.

  Steps to reproduce
  ==================
  The following functional test reproduces the bug: https://review.openstack.org/#/c/498482/
  What it does: initiate evacuation when no valid host is available and evacuation fails, but nova manager still tries to delete the instance.
  Logs:

      2017-08-29 19:11:15,751 ERROR [oslo_messaging.rpc.server] Exception during message handling
      NoValidHost: No valid host was found. There are not enough hosts available.
      2017-08-29 19:11:16,103 INFO [nova.tests.functional.test_servers] Running periodic for compute1 (host1)
      2017-08-29 19:11:16,115 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/aggregates" status: 200 len: 18 microversion: 1.1
      2017-08-29 19:11:16,120 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/inventories" status: 200 len: 401 microversion: 1.0
      2017-08-29 19:11:16,131 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/allocations" status: 200 len: 152 microversion: 1.0
      2017-08-29 19:11:16,138 INFO [nova.compute.resource_tracker] Final resource view: name=host1 phys_ram=8192MB used_ram=1024MB phys_disk=1028GB used_disk=1GB total_vcpus=10 used_vcpus=1 pci_stats=[]
      2017-08-29 19:11:16,146 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/aggregates" status: 200 len: 18 microversion: 1.1
      2017-08-29 19:11:16,151 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/4e8e23ff-0c52-4cf7-8356-d9fa88536316/inventories" status: 200 len: 401 microversion: 1.0
      2017-08-29 19:11:16,152 INFO [nova.tests.functional.test_servers] Running periodic for compute2 (host2)
      2017-08-29 19:11:16,163 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/aggregates" status: 200 len: 18 microversion: 1.1
      2017-08-29 19:11:16,168 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/inventories" status: 200 len: 401 microversion: 1.0
      2017-08-29 19:11:16,176 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/allocations" status: 200 len: 54 microversion: 1.0
      2017-08-29 19:11:16,184 INFO [nova.compute.resource_tracker] Final resource view: name=host2 phys_ram=8192MB used_ram=512MB phys_disk=1028GB used_disk=0GB total_vcpus=10 used_vcpus=0 pci_stats=[]
      2017-08-29 19:11:16,192 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/aggregates" status: 200 len: 18 microversion: 1.1
      2017-08-29 19:11:16,197 INFO [nova.api.openstack.placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/531b1ce8-def1-455d-95b3-4140665d956f/inventories" status: 200 len: 401 microversion: 1.0
      2017-08-29 19:11:16,198 INFO [nova.tests.functional.test_servers] Finished with periodics
      2017-08-29 19:11:16,255 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/6f70656e737461636b20342065766572/servers/5058200c-478e-4449-88c1-906fdd572662" status: 200 len: 1875 microversion: 2.53 time: 0.056198
      2017-08-29 19:11:16,262 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/6f70656e737461636b20342065766572/os-migrations" status: 200 len: 373 microversion: 2.53 time: 0.004618
      2017-08-29 19:11:16,280 INFO [nova.api.openstack.requestlog] 127.0.0.1 "PUT /v2.1/6f70656e737461636b20342065766572/os-services/c269bc74-4720-4de4-a6e5-889080b892a0" status: 200 len: 245 microversion: 2.53 time: 0.016442
      2017-08-29 19:11:16,281 INFO [nova.service] Starting compute node (version 16.0.0)
      2017-08-29 19:11:16,296 INFO [nova.compute.manager] Deleting instance as it has been evacuated from this host

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1713783/+subscriptions


References