← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1718512] [NEW] migration fails if instance build failed on destination host

 

Public bug reported:

(OpenStack Nova, commit d8b30c3772, per OSA-14.2.7)

if an instance build fails on a hypervisor the "retry" field of the
instance's request spec is populated with which host and how many times
it attempted to retry the build. this field remains populated during the
life-time of the instance.

if a live-migration for the same instance is requested, the conductor
loads this request spec and passes it on to the scheduler. the scheduler
will fail the migration request on RetryFilter since the target was
already known to have failed (albeit, for the build).

with the help of mriedem and melwitt of #openstack-nova, we determined
that migration retries are handled separately from build retries.
mriedem suggested a patch to ignore the retry field of the instance
request spec during migrations. this patch allowed the failing migration
to succeed.

it is important to note that it may fail the migration again, however
there is still sufficient reason to ignore the build's failures/retries
during a migration.

12:55 < mriedem> it does stand to reason that if this instance failed to build originally on those 2 hosts, that live migrating it there might fail too...but we don't know why it originally failed, could have been a resource claim issue at the time
12:58 < melwitt> yeah, often it's a failed claim. and also what if that compute host is eventually replaced over the lifetime of the cluster, making it a fresh candidate for several instances that might still avoid it because they once failed to build there back when it was a different machine

** Affects: nova
     Importance: Undecided
     Assignee: Matt Riedemann (mriedem)
         Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1718512

Title:
  migration fails if instance build failed on destination host

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  (OpenStack Nova, commit d8b30c3772, per OSA-14.2.7)

  if an instance build fails on a hypervisor the "retry" field of the
  instance's request spec is populated with which host and how many
  times it attempted to retry the build. this field remains populated
  during the life-time of the instance.

  if a live-migration for the same instance is requested, the conductor
  loads this request spec and passes it on to the scheduler. the
  scheduler will fail the migration request on RetryFilter since the
  target was already known to have failed (albeit, for the build).

  with the help of mriedem and melwitt of #openstack-nova, we determined
  that migration retries are handled separately from build retries.
  mriedem suggested a patch to ignore the retry field of the instance
  request spec during migrations. this patch allowed the failing
  migration to succeed.

  it is important to note that it may fail the migration again, however
  there is still sufficient reason to ignore the build's
  failures/retries during a migration.

  12:55 < mriedem> it does stand to reason that if this instance failed to build originally on those 2 hosts, that live migrating it there might fail too...but we don't know why it originally failed, could have been a resource claim issue at the time
  12:58 < melwitt> yeah, often it's a failed claim. and also what if that compute host is eventually replaced over the lifetime of the cluster, making it a fresh candidate for several instances that might still avoid it because they once failed to build there back when it was a different machine

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1718512/+subscriptions


Follow ups