yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1712411] [NEW] Allocations may not be removed from dest node during failed migrations

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Tue, 22 Aug 2017 19:16:30 -0000
Reply-to: Bug 1712411 <1712411@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

This could also be true for cold migrate/resize/unshelve, but I'm
specifically looking at live migration here.

As of this change in Pike:

https://review.openstack.org/#/c/491012/

Once all computes are upgraded, the resource tracker will no longer
"heal" allocations in Placement for it's local node, meaning creating
allocations for the node if the instance is on it, or removing
allocations for the instance if the instance is not on the node.

During live migration, conductor will call the scheduler to select a
host which is also going to claim resources against the dest node:

https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/conductor/tasks/live_migrate.py#L181

https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/scheduler/filter_scheduler.py#L287

https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/scheduler/client/report.py#L147

The problem during live migration is once the scheduler picks a host,
conductor performs some additional checks:

https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/conductor/tasks/live_migrate.py#L194

Which could fail, and then conductor will retry the scheduler to get
another host, until one is found and passes the pre-migration checks, or
the number of retries are exhausted.

The problem is the allocation created in Placement for the destination
node, which failed some later pre-migration check, is never cleaned up
if the update_available_resource periodic task in the compute manager
doesn't clean it up (again, once all computes are upgraded to Pike).
This leaves the destination node having resources claimed against it
which aren't really on the node.

We could rollback the allocation in conductor on a failure, or we could
put some other kind of periodic cleanup task in the compute service
which looks for failed migrations where the destination node in the
migration record is for that node, and removes any failed allocations
for that node and the given instance.

** Affects: nova
     Importance: High
         Status: Triaged


** Tags: live-migration placement

** Changed in: nova
       Status: New => Triaged

** Changed in: nova
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1712411

Title:
  Allocations may not be removed from dest node during failed migrations

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  This could also be true for cold migrate/resize/unshelve, but I'm
  specifically looking at live migration here.

  As of this change in Pike:

  https://review.openstack.org/#/c/491012/

  Once all computes are upgraded, the resource tracker will no longer
  "heal" allocations in Placement for it's local node, meaning creating
  allocations for the node if the instance is on it, or removing
  allocations for the instance if the instance is not on the node.

  During live migration, conductor will call the scheduler to select a
  host which is also going to claim resources against the dest node:

  https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/conductor/tasks/live_migrate.py#L181

  https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/scheduler/filter_scheduler.py#L287

  https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/scheduler/client/report.py#L147

  The problem during live migration is once the scheduler picks a host,
  conductor performs some additional checks:

  https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/conductor/tasks/live_migrate.py#L194

  Which could fail, and then conductor will retry the scheduler to get
  another host, until one is found and passes the pre-migration checks,
  or the number of retries are exhausted.

  The problem is the allocation created in Placement for the destination
  node, which failed some later pre-migration check, is never cleaned up
  if the update_available_resource periodic task in the compute manager
  doesn't clean it up (again, once all computes are upgraded to Pike).
  This leaves the destination node having resources claimed against it
  which aren't really on the node.

  We could rollback the allocation in conductor on a failure, or we
  could put some other kind of periodic cleanup task in the compute
  service which looks for failed migrations where the destination node
  in the migration record is for that node, and removes any failed
  allocations for that node and the given instance.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1712411/+subscriptions

Follow ups

[Bug 1712411] Re: Allocations may not be removed from dest node during failed migrations
From: OpenStack Infra, 2017-09-06
[Bug 1712411] Re: Allocations may not be removed from dest node during failed migrations
From: Matt Riedemann, 2017-08-22