← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1284719] [NEW] buggy live migration rollback when using shared storage

 

Public bug reported:

I'm running the current Icehouse code in devstack.  I was looking at the
code and noticed something suspicious.

It looks like if we try to migrate a shared-storage instance and fail
and end up rolling back we could end up with messed-up networking on the
destination host.

When setting up a live migration we unconditionally run
ComputeManager.pre_live_migration() on the destination host to do
various things including setting up networks on the host.

If something goes wrong with the live migration in
ComputeManager._rollback_live_migration() we will only call
self.compute_rpcapi.rollback_live_migration_at_destination() if we're
doing block migration or volume-backed migration that isn't shared
storage.

However, looking at
ComputeManager.rollback_live_migration_at_destination(), I also see it
cleaning up networking as well as block device.  If we never call that
cleanup code, then the networking stuff that was done in
pre_live_migration() won't get rolled back.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1284719

Title:
  buggy live migration rollback when using shared storage

Status in OpenStack Compute (Nova):
  New

Bug description:
  I'm running the current Icehouse code in devstack.  I was looking at
  the code and noticed something suspicious.

  It looks like if we try to migrate a shared-storage instance and fail
  and end up rolling back we could end up with messed-up networking on
  the destination host.

  When setting up a live migration we unconditionally run
  ComputeManager.pre_live_migration() on the destination host to do
  various things including setting up networks on the host.

  If something goes wrong with the live migration in
  ComputeManager._rollback_live_migration() we will only call
  self.compute_rpcapi.rollback_live_migration_at_destination() if we're
  doing block migration or volume-backed migration that isn't shared
  storage.

  However, looking at
  ComputeManager.rollback_live_migration_at_destination(), I also see it
  cleaning up networking as well as block device.  If we never call that
  cleanup code, then the networking stuff that was done in
  pre_live_migration() won't get rolled back.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1284719/+subscriptions


Follow ups

References