← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1848343] [NEW] MigrationTask rollback can leak allocations for a deleted server

 

Public bug reported:

This came up in the cross-cell resize review:

https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495

And I was able to recreate with a functional test here:

https://review.opendev.org/#/c/688832/

That test is doing a cross-cell cold migration but looking at the code:

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461

We can hit an issue for same-cell resize/cold migrate if we have swapped
the allocations so the source node allocations are held by the migration
consumer and the instance holds allocations on the target node (created
by the scheduler):

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328

If something fails between ^ and the cast to prep_resize, the task will
rollback and revert the allocations so the target node allocations are
dropped and the source node allocations are moved back to the instance:

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91

Furthermore, if the instance was deleted when we perform that swap, the
move_allocations method will recreate the allocations on the source node
for the now-deleted instance since we don't assert consumer generations
during the swap:

https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886

This results in leaking allocations for the source node since the
instance is deleted.

** Affects: nova
     Importance: Undecided
         Status: Triaged


** Tags: cold-migrate placement resize

** Changed in: nova
       Status: New => Triaged

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848343

Title:
  MigrationTask rollback can leak allocations for a deleted server

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  This came up in the cross-cell resize review:

  https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495

  And I was able to recreate with a functional test here:

  https://review.opendev.org/#/c/688832/

  That test is doing a cross-cell cold migration but looking at the
  code:

  https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461

  We can hit an issue for same-cell resize/cold migrate if we have
  swapped the allocations so the source node allocations are held by the
  migration consumer and the instance holds allocations on the target
  node (created by the scheduler):

  https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328

  If something fails between ^ and the cast to prep_resize, the task
  will rollback and revert the allocations so the target node
  allocations are dropped and the source node allocations are moved back
  to the instance:

  https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91

  Furthermore, if the instance was deleted when we perform that swap,
  the move_allocations method will recreate the allocations on the
  source node for the now-deleted instance since we don't assert
  consumer generations during the swap:

  https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886

  This results in leaking allocations for the source node since the
  instance is deleted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1848343/+subscriptions


Follow ups