yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80396
[Bug 1848343] Re: MigrationTask rollback can leak allocations for a deleted server
** Also affects: nova/queens
Importance: Undecided
Status: New
** Also affects: nova/train
Importance: Undecided
Status: New
** Also affects: nova/stein
Importance: Undecided
Status: New
** Also affects: nova/rocky
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848343
Title:
MigrationTask rollback can leak allocations for a deleted server
Status in OpenStack Compute (nova):
Triaged
Status in OpenStack Compute (nova) queens series:
New
Status in OpenStack Compute (nova) rocky series:
New
Status in OpenStack Compute (nova) stein series:
New
Status in OpenStack Compute (nova) train series:
New
Bug description:
This came up in the cross-cell resize review:
https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495
And I was able to recreate with a functional test here:
https://review.opendev.org/#/c/688832/
That test is doing a cross-cell cold migration but looking at the
code:
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461
We can hit an issue for same-cell resize/cold migrate if we have
swapped the allocations so the source node allocations are held by the
migration consumer and the instance holds allocations on the target
node (created by the scheduler):
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328
If something fails between ^ and the cast to prep_resize, the task
will rollback and revert the allocations so the target node
allocations are dropped and the source node allocations are moved back
to the instance:
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91
Furthermore, if the instance was deleted when we perform that swap,
the move_allocations method will recreate the allocations on the
source node for the now-deleted instance since we don't assert
consumer generations during the swap:
https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886
This results in leaking allocations for the source node since the
instance is deleted.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1848343/+subscriptions
References