yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #66193
[Bug 1707071] [NEW] Compute nodes will fight over allocations during migration
Public bug reported:
As far back as Ocata, compute nodes that manage allocations will end up
overwriting allocations from other compute nodes when doing a migration.
This stems from the fact that the Resource Tracker was designed to
manage a per-compute-node set of accounting, but placement is per-
instance accounting. When we try to create/update/delete allocations for
instances on compute nodes from the existing resource tracker code
paths, we end up deleting allocations that apply to other compute nodes
in the process.
For example, when an instance A is running against compute1, there is an
allocation for its resources against that node. When migrating that
instance to compute2, the target compute (or scheduler) may create
allocations for instance A against compute2, which overwrite those for
compute1. Then, compute1's periodic healing task runs, and deletes the
allocation for instance A against compute2, replacing it with one for
compute1. When migration completes, compute2 heals again and overwrites
the allocation with one for the new home of the instance. Then, compute1
may the allocation it thinks it owns, followed finally by another heal
on compute2. While this is going on, the scheduler (via placement) does
not have a consistent view of resources to make proper decisions.
In order to fix this, we need a combination of changes:
1. There should be allocations against both compute nodes for an instance during a migration
2. Compute nodes should respect the double claim, and not delete allocations for instances it used to own, if the allocation has no resources for its resource provider
3. Compute nodes should not delete allocations for instances unless they own the instance _and_ the instance is in DELETED/SHELVED_OFFLOADED state
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1707071
Title:
Compute nodes will fight over allocations during migration
Status in OpenStack Compute (nova):
New
Bug description:
As far back as Ocata, compute nodes that manage allocations will end
up overwriting allocations from other compute nodes when doing a
migration. This stems from the fact that the Resource Tracker was
designed to manage a per-compute-node set of accounting, but placement
is per-instance accounting. When we try to create/update/delete
allocations for instances on compute nodes from the existing resource
tracker code paths, we end up deleting allocations that apply to other
compute nodes in the process.
For example, when an instance A is running against compute1, there is
an allocation for its resources against that node. When migrating that
instance to compute2, the target compute (or scheduler) may create
allocations for instance A against compute2, which overwrite those for
compute1. Then, compute1's periodic healing task runs, and deletes the
allocation for instance A against compute2, replacing it with one for
compute1. When migration completes, compute2 heals again and
overwrites the allocation with one for the new home of the instance.
Then, compute1 may the allocation it thinks it owns, followed finally
by another heal on compute2. While this is going on, the scheduler
(via placement) does not have a consistent view of resources to make
proper decisions.
In order to fix this, we need a combination of changes:
1. There should be allocations against both compute nodes for an instance during a migration
2. Compute nodes should respect the double claim, and not delete allocations for instances it used to own, if the allocation has no resources for its resource provider
3. Compute nodes should not delete allocations for instances unless they own the instance _and_ the instance is in DELETED/SHELVED_OFFLOADED state
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1707071/+subscriptions
Follow ups