← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1757292] [NEW] port binding 'migrating_to' attribute not cleaned up on failed live migration if using local shared disk

 

Public bug reported:

This code was added back in Newton:
https://review.openstack.org/#/c/275073/

That plumbs a 'migrating_to' attribute in the port binding profile
during live migration. It's needed on the neutron side for live
migration an instance with floating IPs using DVR.

When live migration completes, either successfully or due to failure,
the migrating_to attribute should be cleaned up. This happens via the
setup_networks_on_host() method in the network API
(nova.network.neutronv2.api.API).

The problem is that on a failed live migration, that cleanup only
happens if the instance is not using shared local disk storage because
of this do_cleanup flag:

https://github.com/openstack/nova/blob/3fd863d8bf2fa1fc09acd08d976689462cffd2e3/nova/compute/manager.py#L6540

https://github.com/openstack/nova/blob/3fd863d8bf2fa1fc09acd08d976689462cffd2e3/nova/compute/manager.py#L6506

https://github.com/openstack/nova/blob/3fd863d8bf2fa1fc09acd08d976689462cffd2e3/nova/compute/manager.py#L6185

This is based purely on code inspection since I don't have a multinode
DVR setup with the rbd imagebackend for the libvirt driver to test this
out (we could create a CI job to do all that if we wanted to). But it
seems pretty obvious that the
ComputeManager.rollback_live_migration_at_destination_host code was
primarily meant for disk cleanup of the disks created on the destination
host during pre_live_migration, and also for anything setup on the
physical destination host for nova-network, and doesn't take into
account this 'migrating_to' scenario which can be cleaned up from the
source host.

Having said all this, this code has been in nova since newton and the
DVR migrating_to changes have been in neutron since mitaka, and no one
has reported this problem, so it's either not widely used or it doesn't
cause much of a problem if we don't cleanup the migrating_to entry in
the binding profile on failed live migration, although I'd think neutron
should cleanup the floating IP router gateway that DVR creates on the
dest host.

** Affects: nova
     Importance: Low
         Status: Triaged


** Tags: dvr live-migration neutron

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1757292

Title:
  port binding 'migrating_to' attribute not cleaned up on failed live
  migration if using local shared disk

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  This code was added back in Newton:
  https://review.openstack.org/#/c/275073/

  That plumbs a 'migrating_to' attribute in the port binding profile
  during live migration. It's needed on the neutron side for live
  migration an instance with floating IPs using DVR.

  When live migration completes, either successfully or due to failure,
  the migrating_to attribute should be cleaned up. This happens via the
  setup_networks_on_host() method in the network API
  (nova.network.neutronv2.api.API).

  The problem is that on a failed live migration, that cleanup only
  happens if the instance is not using shared local disk storage because
  of this do_cleanup flag:

  https://github.com/openstack/nova/blob/3fd863d8bf2fa1fc09acd08d976689462cffd2e3/nova/compute/manager.py#L6540

  https://github.com/openstack/nova/blob/3fd863d8bf2fa1fc09acd08d976689462cffd2e3/nova/compute/manager.py#L6506

  https://github.com/openstack/nova/blob/3fd863d8bf2fa1fc09acd08d976689462cffd2e3/nova/compute/manager.py#L6185

  This is based purely on code inspection since I don't have a multinode
  DVR setup with the rbd imagebackend for the libvirt driver to test
  this out (we could create a CI job to do all that if we wanted to).
  But it seems pretty obvious that the
  ComputeManager.rollback_live_migration_at_destination_host code was
  primarily meant for disk cleanup of the disks created on the
  destination host during pre_live_migration, and also for anything
  setup on the physical destination host for nova-network, and doesn't
  take into account this 'migrating_to' scenario which can be cleaned up
  from the source host.

  Having said all this, this code has been in nova since newton and the
  DVR migrating_to changes have been in neutron since mitaka, and no one
  has reported this problem, so it's either not widely used or it
  doesn't cause much of a problem if we don't cleanup the migrating_to
  entry in the binding profile on failed live migration, although I'd
  think neutron should cleanup the floating IP router gateway that DVR
  creates on the dest host.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1757292/+subscriptions


Follow ups