yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1480441] [NEW] Live migration doesn't retry on migration pre-check failure

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: "Chris St. Pierre" <1480441@xxxxxxxxxxxxxxxxxx>
Date: Fri, 31 Jul 2015 19:50:43 -0000
Reply-to: Bug 1480441 <1480441@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

When live migrating an instance, it is supposed to retry some
(configurable) number of times. It only retries if the host
compatibility and migration pre-checks raise nova.exception.Invalid,
though:

https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L167-L174

If, for instance, a destination hypervisor has run out of disk space it
will not raise an Invalid subclass, but rather MigrationPreCheckError,
which causes the retry loop to short-circuit. Nova should instead retry
as long as either Invalid or MigrationPreCheckError is raised.

This can be tricky to reproduce because it only occurs if a host raises
MigrationPreCheckError before a valid host is found, so it's dependent
upon the order in which the scheduler supplies possible destinations to
the conductor. In theory, though, it can be reproduced by bringing up a
number of hypervisors, exhausting the disk on one -- ideally the one
that the scheduler will return first -- and then attempting a live
migration. It will fail with something like:

$ nova live-migration  --block-migrate stpierre-test-1 ERROR
(BadRequest): Migration pre-check error: Unable to migrate f44296dd-
ffa6-4ec0-8256-c311d025d46c: Disk of instance is too large(available on
destination host:-38654705664 < need:1073741824) (HTTP 400) (Request-ID:
req-9951691a-c63c-4888-bec5-30a072dfe727)

Even when there are valid hosts to migrate to.

** Affects: nova
     Importance: Undecided
     Assignee: Chris St. Pierre (stpierre)
         Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1480441

Title:
  Live migration doesn't retry on migration pre-check failure

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  When live migrating an instance, it is supposed to retry some
  (configurable) number of times. It only retries if the host
  compatibility and migration pre-checks raise nova.exception.Invalid,
  though:

  https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L167-L174

  If, for instance, a destination hypervisor has run out of disk space
  it will not raise an Invalid subclass, but rather
  MigrationPreCheckError, which causes the retry loop to short-circuit.
  Nova should instead retry as long as either Invalid or
  MigrationPreCheckError is raised.

  This can be tricky to reproduce because it only occurs if a host
  raises MigrationPreCheckError before a valid host is found, so it's
  dependent upon the order in which the scheduler supplies possible
  destinations to the conductor. In theory, though, it can be reproduced
  by bringing up a number of hypervisors, exhausting the disk on one --
  ideally the one that the scheduler will return first -- and then
  attempting a live migration. It will fail with something like:

  $ nova live-migration  --block-migrate stpierre-test-1 ERROR
  (BadRequest): Migration pre-check error: Unable to migrate f44296dd-
  ffa6-4ec0-8256-c311d025d46c: Disk of instance is too large(available
  on destination host:-38654705664 < need:1073741824) (HTTP 400)
  (Request-ID: req-9951691a-c63c-4888-bec5-30a072dfe727)

  Even when there are valid hosts to migrate to.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1480441/+subscriptions

Follow ups

[Bug 1480441] Re: Live migration doesn't retry on migration pre-check failure
From: Thierry Carrez, 2015-09-03