← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1848343] Re: MigrationTask rollback can leak allocations for a deleted server

 

** Also affects: nova/queens
   Importance: Undecided
       Status: New

** Also affects: nova/train
   Importance: Undecided
       Status: New

** Also affects: nova/stein
   Importance: Undecided
       Status: New

** Also affects: nova/rocky
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1848343

Title:
  MigrationTask rollback can leak allocations for a deleted server

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  New
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) train series:
  New

Bug description:
  This came up in the cross-cell resize review:

  https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495

  And I was able to recreate with a functional test here:

  https://review.opendev.org/#/c/688832/

  That test is doing a cross-cell cold migration but looking at the
  code:

  https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461

  We can hit an issue for same-cell resize/cold migrate if we have
  swapped the allocations so the source node allocations are held by the
  migration consumer and the instance holds allocations on the target
  node (created by the scheduler):

  https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328

  If something fails between ^ and the cast to prep_resize, the task
  will rollback and revert the allocations so the target node
  allocations are dropped and the source node allocations are moved back
  to the instance:

  https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91

  Furthermore, if the instance was deleted when we perform that swap,
  the move_allocations method will recreate the allocations on the
  source node for the now-deleted instance since we don't assert
  consumer generations during the swap:

  https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886

  This results in leaking allocations for the source node since the
  instance is deleted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1848343/+subscriptions


References