yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #72294
[Bug 1763183] [NEW] DELETE /os-services/{service_id} does not block for hosted instances
Public bug reported:
This came up while reviewing the fix for bug 1756179:
https://review.openstack.org/#/c/554920/6/nova/api/openstack/compute/services.py@226
Full IRC conversation is here:
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-
nova.2018-04-11.log.html#t2018-04-11T20:32:13
The summary is that it's possible to delete a compute service and it's
associated compute node record even if that compute node has instances
on it.
Before placement, this wasn't a huge problem because you could evacuate
the instances to another host or if you brought the host back up, it
will recreate the service and compute node and the resource tracker will
"heal" itself by finding instances running on that host and node combo:
https://github.com/openstack/nova/blob/2c5da2212c3fa3e589c4af171486a2097fd8c54e/nova/compute/resource_tracker.py#L714
The problem is after we started requiring placement, and creating
allocations in the scheduler in Pike, those allocations are against the
compute_nodes.uuid for the compute node resource provider. If the
service and it's related compute node record are deleted, restarting the
service will create a new service and compute node record with a new
UUID which will result in a new resource provider in placement, and the
instances running on that host will have allocations against the now
orphaned resource provider. The new resource provider will be reporting
incorrect consumption so scheduling will also be affected.
So we should block deleting a compute service (and it's node) here:
https://github.com/openstack/nova/blob/2c5da2212c3fa3e589c4af171486a2097fd8c54e/nova/api/openstack/compute/services.py#L213
If that host (node) has instances on it.
This problem goes back to Pike. Ocata is OK in that the resource tracker
on Ocata computes will "heal" allocations during the
update_available_resource periodic task (and when the compute service
starts up), and in Ocata the FilterScheduler does not create allocations
in Placement.
** Affects: nova
Importance: High
Assignee: Matt Riedemann (mriedem)
Status: Triaged
** Affects: nova/pike
Importance: Undecided
Status: New
** Affects: nova/queens
Importance: Undecided
Status: New
** Tags: api placement
** Also affects: nova/pike
Importance: Undecided
Status: New
** Also affects: nova/queens
Importance: Undecided
Status: New
** Changed in: nova
Assignee: (unassigned) => Matt Riedemann (mriedem)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1763183
Title:
DELETE /os-services/{service_id} does not block for hosted instances
Status in OpenStack Compute (nova):
Triaged
Status in OpenStack Compute (nova) pike series:
New
Status in OpenStack Compute (nova) queens series:
New
Bug description:
This came up while reviewing the fix for bug 1756179:
https://review.openstack.org/#/c/554920/6/nova/api/openstack/compute/services.py@226
Full IRC conversation is here:
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-
nova.2018-04-11.log.html#t2018-04-11T20:32:13
The summary is that it's possible to delete a compute service and it's
associated compute node record even if that compute node has instances
on it.
Before placement, this wasn't a huge problem because you could
evacuate the instances to another host or if you brought the host back
up, it will recreate the service and compute node and the resource
tracker will "heal" itself by finding instances running on that host
and node combo:
https://github.com/openstack/nova/blob/2c5da2212c3fa3e589c4af171486a2097fd8c54e/nova/compute/resource_tracker.py#L714
The problem is after we started requiring placement, and creating
allocations in the scheduler in Pike, those allocations are against
the compute_nodes.uuid for the compute node resource provider. If the
service and it's related compute node record are deleted, restarting
the service will create a new service and compute node record with a
new UUID which will result in a new resource provider in placement,
and the instances running on that host will have allocations against
the now orphaned resource provider. The new resource provider will be
reporting incorrect consumption so scheduling will also be affected.
So we should block deleting a compute service (and it's node) here:
https://github.com/openstack/nova/blob/2c5da2212c3fa3e589c4af171486a2097fd8c54e/nova/api/openstack/compute/services.py#L213
If that host (node) has instances on it.
This problem goes back to Pike. Ocata is OK in that the resource
tracker on Ocata computes will "heal" allocations during the
update_available_resource periodic task (and when the compute service
starts up), and in Ocata the FilterScheduler does not create
allocations in Placement.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1763183/+subscriptions
Follow ups