← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1834045] Re: Live-migration double binding doesn't work with OVN

 

Fix already released: https://review.opendev.org/#/c/673803/

** Changed in: networking-ovn
       Status: New => Fix Released

** Changed in: neutron
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1834045

Title:
  Live-migration double binding doesn't work with OVN

Status in networking-ovn:
  Fix Released
Status in neutron:
  Fix Released
Status in OpenStack Compute (nova):
  Incomplete
Status in neutron package in Ubuntu:
  Fix Released

Bug description:
  For ml2/OVN live-migration doesn't work. After spending some time
  debugging this issue I found that its potentially more complicated and
  not related to OVN intself.

  Here is the full story behind not working live-migration while using
  OVN in latest u/s master.

  To speedup live-migration double-binding was introduced in neutron [1] and nova [2]. It implements this blueprint [3]. In short words it creates double binding (ACTIVE and INACTIVE) to verify if network bind is possible to be done on destination host and then starts live-migration (to not waste time in case of rollback).
  This mechanism started to be default in Stein [4]. So before actual qemu live-migration neutron should send 'network-vif-plugged' to nova and then migration is being run.

  While using OVN this mechanism doesn't work. Notification 'network-
  vif-plugged' is not being send so live-migration is stuck at the
  beginning.

  Lets check how those notifications are send. On every change of
  'status' field (sqlalchemy event) in neutron.ports row [5] function
  [6] is executed and it is responsible for sending 'network-vif-
  unplugged' and 'network-vif-plugged' notifications.

  During pre_live_migration tasks two bindings and bindings levels are created. At the end of this process I found that commit_port_binding() is executed [7]. At this time neutron port status in the db is DOWN. 
  I found that at the end of commit_port_binding() [8] after neutron_lib.callbacks.registry notification is send the port status moves to UP. For ml2/OVN it stays DOWN. This is the first difference that I found between ml2/ovs and ml2/ovn.

  After a bit digging I figured out how 'network-vif-plugged' is triggered in ml2/ovs.
  Lets see how this is done.

  1. On list of registered callbacks in ml2/ovs [8] we have configured
  callback from class ovo_rpc._ObjectChangeHandler [9] and at the end of
  commit_port_binding() this callback is used.

  -------------------------------------------------------------
  neutron.plugins.ml2.ovo_rpc._ObjectChangeHandler.handle_event
  -------------------------------------------------------------

  2. It is responsible for pushing new port object revisions to agents,
  like:

  ----------------------------------------------------------------------------
  Jun 24 10:01:01 test-migrate-1 neutron-server[3685]: DEBUG neutron.api.rpc.handlers.resources_rpc [None req-1430f349-d644-4d33-8833-90fad0124dcd service neutron] Pushing event updated for resources: {'Port': ['ID=3704a567-ef4c-4f6d-9557-a1191de07c4a,revision_number=10']} {{(pid=3697) push /opt/stack/neutron/neutron/api/rpc/handlers/resources_rpc.py:243}}
  ----------------------------------------------------------------------------

  3. OVS agent consumes it and sends back RPC to the neutron server that port is actually UP (on source node!):
  ------------------------------------------------------------------------------------------------------------
  Jun 24 10:01:01 test-migrate-1 neutron-openvswitch-agent[18660]: DEBUG neutron.agent.resource_cache [None req-1430f349-d644-4d33-8833-90fad0124dcd service neutron] Resource Port 3704a567-ef4c-4f6d-9557-a1191de07c4a updated (revision_number 8->10). Old fields: {'status': u'ACTIVE', 'bindings': [PortBinding(host='test-migrate-1',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={},status='INACTIVE',vif_details={"port_filter": true, "bridge_name": "br-int", "datapath_type": "system", "ovs_hybrid_plug": false},vif_type='ovs',vnic_type='normal'), PortBinding(host='test-migrate-2',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={"migrating_to": "test-migrate-1"},status='ACTIVE',vif_details={"port_filter": true, "bridge_name": "br-int", "datapath_type": "system", "ovs_hybrid_plug": false},vif_type='ovs',vnic_type='normal')], 'binding_levels': [PortBindingLevel(driver='openvswitch',host='test-migrate-1',level=0,port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,segment=NetworkSegment(c6866834-4577-497f-a6c8-ff9724a82e59),segment_id=c6866834-4577-497f-a6c8-ff9724a82e59), PortBindingLevel(driver='openvswitch',host='test-migrate-2',level=0,port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,segment=NetworkSegment(c6866834-4577-497f-a6c8-ff9724a82e59),segment_id=c6866834-4577-497f-a6c8-ff9724a82e59)]} New fields: {'status': u'DOWN', 'bindings': [PortBinding(host='test-migrate-1',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={},status='ACTIVE',vif_details={"port_filter": true, "bridge_name": "br-int", "datapath_type": "system", "ovs_hybrid_plug": false},vif_type='ovs',vnic_type='normal'), PortBinding(host='test-migrate-2',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={"migrating_to": "test-migrate-1"},status='INACTIVE',vif_details=None,vif_type='unbound',vnic_type='normal')], 'binding_levels': [PortBindingLevel(driver='openvswitch',host='test-migrate-1',level=0,port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,segment=NetworkSegment(c6866834-4577-497f-a6c8-ff9724a82e59),segment_id=c6866834-4577-497f-a6c8-ff9724a82e59)]} {{(pi
  Jun 24 10:01:01 test-migrate-1 neutron-openvswitch-agent[18660]: d=18660) record_resource_update /opt/stack/neutron/neutron/agent/resource_cache.py:186}}
  ...

  Jun 24 10:01:02 test-migrate-1 neutron-openvswitch-agent[18660]: DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-9daaf112-57f4-49bb-8390-4b65a5c5e674 None None] Setting status for 3704a567-ef4c-4f6d-9557-a1191de07c4a to UP {{(pid=18660) _bind_devices /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1088}}
  ------------------------------------------------------------------------------------------------------------

  4. Neutron server consumes it:
  ------------------------------------------------------------------------------------------------------------
  Jun 24 10:01:02 test-migrate-1 neutron-server[3685]: DEBUG neutron.plugins.ml2.rpc [None req-62e69669-fa7e-4f70-9e38-38cb3e2c30a7 None None] Device 3704a567-ef4c-4f6d-9557-a1191de07c4a up at agent ovs-agent-test-migrate-1 {{(pid=3698) update_device_up /opt/stack/neutron/neutron/plugins/ml2/rpc.py:269}}
  ...
  Jun 24 10:01:02 test-migrate-1 neutron-server[3685]: DEBUG neutron.db.provisioning_blocks [None req-62e69669-fa7e-4f70-9e38-38cb3e2c30a7 None None] Provisioning for port 3704a567-ef4c-4f6d-9557-a1191de07c4a completed by entity L2. {{(pid=3698) provisioning_complete /opt/stack/neutron/neutron/db/provisioning_blocks.py:133}}
  ...
  Jun 24 10:01:02 test-migrate-1 neutron-server[3685]: DEBUG neutron.db.provisioning_blocks [None req-62e69669-fa7e-4f70-9e38-38cb3e2c30a7 None None] Provisioning complete for port 3704a567-ef4c-4f6d-9557-a1191de07c4a triggered by entity L2. {{(pid=3698) provisioning_complete /opt/stack/neutron/neutron/db/provisioning_blocks.py:140}}
  ------------------------------------------------------------------------------------------------------------

  and then generates internal event "PROVISIONING_COMPLETE" [10]. This
  event is consumed by [11] and port_provisioned() updates port status
  in the DB to UP [12]. At the end it emits notification 'network-vif-
  plugged' and nova continues migration.

  
  In ml2/ovn we don't have agents, so we don't use ovo_rpc. That's why migration for ml2/ovn doesn't work.

  It looks like general bug somewhere between nova and neutron. Neutron shouldn't send notification 'network-vif-plug' during configuration of double binding from source host like it is now (paragraph 3.)
  Maybe we could consider using some more sophisticated names, like 'neutron-vif-inactive-binding-set'?
  Maybe nova could watch for inactive binding being created [13] and then start live-migration
  instead waiting for neutron notification?

  
  Thanks,
  Maciej


  [1] https://review.opendev.org/#/q/topic:bp/live-migration-portbinding+(status:open+OR+status:merged)
  [2] https://review.opendev.org/#/c/558001/
  [3] https://blueprints.launchpad.net/nova/+spec/neutron-new-port-binding-api 
  [4] https://review.opendev.org/#/c/635360/
  [5] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/db/db_base_plugin_v2.py#L173
  [6] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/notifiers/nova.py#L182
  [7] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L505
  [8] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L713
  [9] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/ovo_rpc.py#L51
  [10] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/db/provisioning_blocks.py#L140
  [11] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L285
  [12] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L316
  [13] https://specs.openstack.org/openstack/neutron-specs/specs/backlog/pike/portbinding_information_for_nova.html#list-bindings

To manage notifications about this bug go to:
https://bugs.launchpad.net/networking-ovn/+bug/1834045/+subscriptions


References