← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1580880] [NEW] [RFE] Distributed Portbinding for all port tpyes

 

Public bug reported:

Summary
=======

Today only DVR ports can be bound to multiple hosts. But having a port bound to
multiple hosts does also make sense for a compute port during live migration.
For a certain period of time the port could be bound to the source and
target at the same time (Although only one is being used). The information of both bindings needs to be accessible from Nova via the ReST API.

Use Cases
=========

* Instance in error state when portbinding fails

    In the live migration process, port binding is triggered by Nova after
    the migration already succeeded. If port binding fails, the instance is
    stuck in error state. If portbinding for the target node would be done in
    pre_live_migration, migration could be aborted on a binding failure and the
    instance would still be active on the migration source host. But we
    cannot just do so, as some TOR mech drivers would shut down the source
    port after the binding has been updated, although the instance is still
    active on the source. If we could bind a compute port to both hosts, such
    drivers could keep the source port open, and already process the target
    port in parallel.

* Live Migration between hosts running different l2 agents

    Another use case is live migration between hosts that run different l2
    agents. This requires that Nova updates the instance definition before
    migration is executed (in case of libvirt, update the domain.xml with
    target interface definition).

    A specialized variant of this use case is the migration from an agent with
    one firewall driver to another (e.g. from ovs hybrid-fw driver to new ovs
    conntrackd firewall driver).

* Live migration with MacVTap agent when different physnet mappings is
used

    The third use case is live migration with MacVTap agent. Today it has some
    restrictions with live migration in some special scenarios [1]. It requires
    an update on the instance definition (libvirt domain.xml) before the
    migration started.

    For updating the definition in time, a portbinding for the migration target
    node is required even before the migration started. Along the argumentation
    above, we need a compute port bound to multiple hosts.

Proposed Change
===============

* A refactoring of the database is required to make a normal port a
special case of a distributed port. This was planned since a long time
but was never finished. The efforts are tracked via this bug [1]. The
patches still need to be rebased to get that going again.

* ReST API changes are required to externalize the bindings. To not
overload the port API, a new subresource "bindings" could be created
(like /ports/{port-id}/bindings) that holds the list of all bindings.
CREATE/DELETE/UPDATE must be supported. Not UUID or would be required
for this resource, as its identifier would be the host_id!


Nova Changes
============
* In pre_live_migration, nova would add a new binding for the migration target host to the port - this triggers portbinding in Neutron.
* Before migration starts, Nova would access the binding information for the target host. It would abort on "binding_failed" vif type. Otherwise it would modify the instance definition (e.g. domain.xml) for the migration target with this binding information.
* After live migration succeeded, Nova would remove the original port_binding. On Rollback, it would just remove the target port_binding.

Those changes are tracked via the following Nova blueprint [4]


Open Questions
==============
* This RFE is based on bug [1]. How to track those dependencies? Or should the content of this bug become part of this effort?
* Similar with the macvtap live migration bug [3]
* How does this effort correlate to the the RFE for externalizing multi-segment networks [2]?


[1] https://bugs.launchpad.net/neutron/+bug/1367391
[2] https://bugs.launchpad.net/neutron/+bug/1573197
[3] https://bugs.launchpad.net/neutron/+bug/1550400
[4] https://blueprints.launchpad.net/nova/+spec/migration-use-target-vif

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1580880

Title:
  [RFE] Distributed Portbinding for all port tpyes

Status in neutron:
  New

Bug description:
  Summary
  =======

  Today only DVR ports can be bound to multiple hosts. But having a port bound to
  multiple hosts does also make sense for a compute port during live migration.
  For a certain period of time the port could be bound to the source and
  target at the same time (Although only one is being used). The information of both bindings needs to be accessible from Nova via the ReST API.

  Use Cases
  =========

  * Instance in error state when portbinding fails

      In the live migration process, port binding is triggered by Nova after
      the migration already succeeded. If port binding fails, the instance is
      stuck in error state. If portbinding for the target node would be done in
      pre_live_migration, migration could be aborted on a binding failure and the
      instance would still be active on the migration source host. But we
      cannot just do so, as some TOR mech drivers would shut down the source
      port after the binding has been updated, although the instance is still
      active on the source. If we could bind a compute port to both hosts, such
      drivers could keep the source port open, and already process the target
      port in parallel.

  * Live Migration between hosts running different l2 agents

      Another use case is live migration between hosts that run different l2
      agents. This requires that Nova updates the instance definition before
      migration is executed (in case of libvirt, update the domain.xml with
      target interface definition).

      A specialized variant of this use case is the migration from an agent with
      one firewall driver to another (e.g. from ovs hybrid-fw driver to new ovs
      conntrackd firewall driver).

  * Live migration with MacVTap agent when different physnet mappings is
  used

      The third use case is live migration with MacVTap agent. Today it has some
      restrictions with live migration in some special scenarios [1]. It requires
      an update on the instance definition (libvirt domain.xml) before the
      migration started.

      For updating the definition in time, a portbinding for the migration target
      node is required even before the migration started. Along the argumentation
      above, we need a compute port bound to multiple hosts.

  Proposed Change
  ===============

  * A refactoring of the database is required to make a normal port a
  special case of a distributed port. This was planned since a long time
  but was never finished. The efforts are tracked via this bug [1]. The
  patches still need to be rebased to get that going again.

  * ReST API changes are required to externalize the bindings. To not
  overload the port API, a new subresource "bindings" could be created
  (like /ports/{port-id}/bindings) that holds the list of all bindings.
  CREATE/DELETE/UPDATE must be supported. Not UUID or would be required
  for this resource, as its identifier would be the host_id!

  
  Nova Changes
  ============
  * In pre_live_migration, nova would add a new binding for the migration target host to the port - this triggers portbinding in Neutron.
  * Before migration starts, Nova would access the binding information for the target host. It would abort on "binding_failed" vif type. Otherwise it would modify the instance definition (e.g. domain.xml) for the migration target with this binding information.
  * After live migration succeeded, Nova would remove the original port_binding. On Rollback, it would just remove the target port_binding.

  Those changes are tracked via the following Nova blueprint [4]

  
  Open Questions
  ==============
  * This RFE is based on bug [1]. How to track those dependencies? Or should the content of this bug become part of this effort?
  * Similar with the macvtap live migration bug [3]
  * How does this effort correlate to the the RFE for externalizing multi-segment networks [2]?


  
  [1] https://bugs.launchpad.net/neutron/+bug/1367391
  [2] https://bugs.launchpad.net/neutron/+bug/1573197
  [3] https://bugs.launchpad.net/neutron/+bug/1550400
  [4] https://blueprints.launchpad.net/nova/+spec/migration-use-target-vif

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1580880/+subscriptions


Follow ups