yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80394
[Bug 1848343] [NEW] MigrationTask rollback can leak allocations for a deleted server
Public bug reported:
This came up in the cross-cell resize review:
https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495
And I was able to recreate with a functional test here:
https://review.opendev.org/#/c/688832/
That test is doing a cross-cell cold migration but looking at the code:
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461
We can hit an issue for same-cell resize/cold migrate if we have swapped
the allocations so the source node allocations are held by the migration
consumer and the instance holds allocations on the target node (created
by the scheduler):
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328
If something fails between ^ and the cast to prep_resize, the task will
rollback and revert the allocations so the target node allocations are
dropped and the source node allocations are moved back to the instance:
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91
Furthermore, if the instance was deleted when we perform that swap, the
move_allocations method will recreate the allocations on the source node
for the now-deleted instance since we don't assert consumer generations
during the swap:
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886
This results in leaking allocations for the source node since the
instance is deleted.
** Affects: nova
Importance: Undecided
Status: Triaged
** Tags: cold-migrate placement resize
** Changed in: nova
Status: New => Triaged
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848343
Title:
MigrationTask rollback can leak allocations for a deleted server
Status in OpenStack Compute (nova):
Triaged
Bug description:
This came up in the cross-cell resize review:
https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495
And I was able to recreate with a functional test here:
https://review.opendev.org/#/c/688832/
That test is doing a cross-cell cold migration but looking at the
code:
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461
We can hit an issue for same-cell resize/cold migrate if we have
swapped the allocations so the source node allocations are held by the
migration consumer and the instance holds allocations on the target
node (created by the scheduler):
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328
If something fails between ^ and the cast to prep_resize, the task
will rollback and revert the allocations so the target node
allocations are dropped and the source node allocations are moved back
to the instance:
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91
Furthermore, if the instance was deleted when we perform that swap,
the move_allocations method will recreate the allocations on the
source node for the now-deleted instance since we don't assert
consumer generations during the swap:
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886
This results in leaking allocations for the source node since the
instance is deleted.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1848343/+subscriptions
Follow ups