← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1763183] Related fix merged to nova (master)

 

Reviewed:  https://review.openstack.org/560626
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a80ac96362c8fafba1bfe71244b52ba2f082c86e
Submitter: Zuul
Branch:    master

commit a80ac96362c8fafba1bfe71244b52ba2f082c86e
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Wed Apr 11 16:00:59 2018 -0400

    Add functional test for deleting a compute service
    
    This adds a functional test which asserts the things
    related to bug 1756179 where deleting a compute service
    does not also delete the related host mapping or resource
    provider resources.
    
    Also related to bug 1763183 in that it should not be
    possible to delete a compute service that has instances
    running on it since that will mess up resource tracking
    in Placement.
    
    Change-Id: I519c5abfe24b154998f481c8a86db239a75d4729
    Related-Bug: #1756179
    Related-Bug: #1763183


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1763183

Title:
  DELETE /os-services/{service_id} does not block for hosted instances

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Confirmed
Status in OpenStack Compute (nova) queens series:
  Confirmed

Bug description:
  This came up while reviewing the fix for bug 1756179:

  https://review.openstack.org/#/c/554920/6/nova/api/openstack/compute/services.py@226

  Full IRC conversation is here:

  http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-
  nova.2018-04-11.log.html#t2018-04-11T20:32:13

  The summary is that it's possible to delete a compute service and it's
  associated compute node record even if that compute node has instances
  on it.

  Before placement, this wasn't a huge problem because you could
  evacuate the instances to another host or if you brought the host back
  up, it will recreate the service and compute node and the resource
  tracker will "heal" itself by finding instances running on that host
  and node combo:

  https://github.com/openstack/nova/blob/2c5da2212c3fa3e589c4af171486a2097fd8c54e/nova/compute/resource_tracker.py#L714

  The problem is after we started requiring placement, and creating
  allocations in the scheduler in Pike, those allocations are against
  the compute_nodes.uuid for the compute node resource provider. If the
  service and it's related compute node record are deleted, restarting
  the service will create a new service and compute node record with a
  new UUID which will result in a new resource provider in placement,
  and the instances running on that host will have allocations against
  the now orphaned resource provider. The new resource provider will be
  reporting incorrect consumption so scheduling will also be affected.

  So we should block deleting a compute service (and it's node) here:

  https://github.com/openstack/nova/blob/2c5da2212c3fa3e589c4af171486a2097fd8c54e/nova/api/openstack/compute/services.py#L213

  If that host (node) has instances on it.

  This problem goes back to Pike. Ocata is OK in that the resource
  tracker on Ocata computes will "heal" allocations during the
  update_available_resource periodic task (and when the compute service
  starts up), and in Ocata the FilterScheduler does not create
  allocations in Placement.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1763183/+subscriptions


References