yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #72136
[Bug 1757292] Re: port binding 'migrating_to' attribute not cleaned up on failed live migration if using local shared disk
Reviewed: https://review.openstack.org/555481
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bb8ba2cf568fda4c5c59352296b758705869fb2f
Submitter: Zuul
Branch: master
commit bb8ba2cf568fda4c5c59352296b758705869fb2f
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Thu Mar 22 17:38:00 2018 -0400
Teardown networking when rolling back live migration even if shared disk
Change I2c86989ab7c6593bf346611cde8c043116d55bc5 way back in Essex
added the "setup_network_on_host" network API calls to the migration
flows, including rollback_live_migration_at_destination. The initial
implementation of that method for Quantum (Neutron) was a no-op.
Change Ib1cc44bf9d01baf4d1f1d26c2a368a5ca7c6ab68 in Newton added the
Neutron implementation for the setup_networks_on_host method in order
to track the destination host being migrated to for instances that
have floating IPs with DVR.
When rolling back from a live migration failure on the destination host,
the "migrating_to" attribute in the port binding profile, added in
pre_live_migration() on the destination compute, is cleared.
However, that only happens in rollback_live_migration_at_destination,
which is only called if the instance is not on shared storage (think
libvirt with the rbd image backend or with NFS). That's controlled
via the "do_cleanup" flag returned from _live_migration_cleanup_flags().
If the live migration is happening over shared storage and fails, then
rollback_live_migration_at_destination isn't called which means
setup_network_on_host isn't called, which means the "migrating_to"
attribute in the port binding profile isn't cleaned up.
This change simply adds the cleanup in _rollback_live_migration in the
event that neutron is being used and we're live migrating over shared
storage so rollback_live_migration_at_destination isn't called.
Change-Id: I658e0a749e842163ed74f82c975bcaf19f9f7f07
Closes-Bug: #1757292
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1757292
Title:
port binding 'migrating_to' attribute not cleaned up on failed live
migration if using local shared disk
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) pike series:
Confirmed
Status in OpenStack Compute (nova) queens series:
Confirmed
Bug description:
This code was added back in Newton:
https://review.openstack.org/#/c/275073/
That plumbs a 'migrating_to' attribute in the port binding profile
during live migration. It's needed on the neutron side for live
migration an instance with floating IPs using DVR.
When live migration completes, either successfully or due to failure,
the migrating_to attribute should be cleaned up. This happens via the
setup_networks_on_host() method in the network API
(nova.network.neutronv2.api.API).
The problem is that on a failed live migration, that cleanup only
happens if the instance is not using shared local disk storage because
of this do_cleanup flag:
https://github.com/openstack/nova/blob/3fd863d8bf2fa1fc09acd08d976689462cffd2e3/nova/compute/manager.py#L6540
https://github.com/openstack/nova/blob/3fd863d8bf2fa1fc09acd08d976689462cffd2e3/nova/compute/manager.py#L6506
https://github.com/openstack/nova/blob/3fd863d8bf2fa1fc09acd08d976689462cffd2e3/nova/compute/manager.py#L6185
This is based purely on code inspection since I don't have a multinode
DVR setup with the rbd imagebackend for the libvirt driver to test
this out (we could create a CI job to do all that if we wanted to).
But it seems pretty obvious that the
ComputeManager.rollback_live_migration_at_destination_host code was
primarily meant for disk cleanup of the disks created on the
destination host during pre_live_migration, and also for anything
setup on the physical destination host for nova-network, and doesn't
take into account this 'migrating_to' scenario which can be cleaned up
from the source host.
Having said all this, this code has been in nova since newton and the
DVR migrating_to changes have been in neutron since mitaka, and no one
has reported this problem, so it's either not widely used or it
doesn't cause much of a problem if we don't cleanup the migrating_to
entry in the binding profile on failed live migration, although I'd
think neutron should cleanup the floating IP router gateway that DVR
creates on the dest host.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1757292/+subscriptions
References