yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #72482
[Bug 1763183] Related fix merged to nova (master)
Reviewed: https://review.openstack.org/560626
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a80ac96362c8fafba1bfe71244b52ba2f082c86e
Submitter: Zuul
Branch: master
commit a80ac96362c8fafba1bfe71244b52ba2f082c86e
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Wed Apr 11 16:00:59 2018 -0400
Add functional test for deleting a compute service
This adds a functional test which asserts the things
related to bug 1756179 where deleting a compute service
does not also delete the related host mapping or resource
provider resources.
Also related to bug 1763183 in that it should not be
possible to delete a compute service that has instances
running on it since that will mess up resource tracking
in Placement.
Change-Id: I519c5abfe24b154998f481c8a86db239a75d4729
Related-Bug: #1756179
Related-Bug: #1763183
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1763183
Title:
DELETE /os-services/{service_id} does not block for hosted instances
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) pike series:
Confirmed
Status in OpenStack Compute (nova) queens series:
Confirmed
Bug description:
This came up while reviewing the fix for bug 1756179:
https://review.openstack.org/#/c/554920/6/nova/api/openstack/compute/services.py@226
Full IRC conversation is here:
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-
nova.2018-04-11.log.html#t2018-04-11T20:32:13
The summary is that it's possible to delete a compute service and it's
associated compute node record even if that compute node has instances
on it.
Before placement, this wasn't a huge problem because you could
evacuate the instances to another host or if you brought the host back
up, it will recreate the service and compute node and the resource
tracker will "heal" itself by finding instances running on that host
and node combo:
https://github.com/openstack/nova/blob/2c5da2212c3fa3e589c4af171486a2097fd8c54e/nova/compute/resource_tracker.py#L714
The problem is after we started requiring placement, and creating
allocations in the scheduler in Pike, those allocations are against
the compute_nodes.uuid for the compute node resource provider. If the
service and it's related compute node record are deleted, restarting
the service will create a new service and compute node record with a
new UUID which will result in a new resource provider in placement,
and the instances running on that host will have allocations against
the now orphaned resource provider. The new resource provider will be
reporting incorrect consumption so scheduling will also be affected.
So we should block deleting a compute service (and it's node) here:
https://github.com/openstack/nova/blob/2c5da2212c3fa3e589c4af171486a2097fd8c54e/nova/api/openstack/compute/services.py#L213
If that host (node) has instances on it.
This problem goes back to Pike. Ocata is OK in that the resource
tracker on Ocata computes will "heal" allocations during the
update_available_resource periodic task (and when the compute service
starts up), and in Ocata the FilterScheduler does not create
allocations in Placement.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1763183/+subscriptions
References