yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #83548
[Bug 1834045] Re: Live-migration double binding doesn't work with OVN
Fix already released: https://review.opendev.org/#/c/673803/
** Changed in: networking-ovn
Status: New => Fix Released
** Changed in: neutron
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1834045
Title:
Live-migration double binding doesn't work with OVN
Status in networking-ovn:
Fix Released
Status in neutron:
Fix Released
Status in OpenStack Compute (nova):
Incomplete
Status in neutron package in Ubuntu:
Fix Released
Bug description:
For ml2/OVN live-migration doesn't work. After spending some time
debugging this issue I found that its potentially more complicated and
not related to OVN intself.
Here is the full story behind not working live-migration while using
OVN in latest u/s master.
To speedup live-migration double-binding was introduced in neutron [1] and nova [2]. It implements this blueprint [3]. In short words it creates double binding (ACTIVE and INACTIVE) to verify if network bind is possible to be done on destination host and then starts live-migration (to not waste time in case of rollback).
This mechanism started to be default in Stein [4]. So before actual qemu live-migration neutron should send 'network-vif-plugged' to nova and then migration is being run.
While using OVN this mechanism doesn't work. Notification 'network-
vif-plugged' is not being send so live-migration is stuck at the
beginning.
Lets check how those notifications are send. On every change of
'status' field (sqlalchemy event) in neutron.ports row [5] function
[6] is executed and it is responsible for sending 'network-vif-
unplugged' and 'network-vif-plugged' notifications.
During pre_live_migration tasks two bindings and bindings levels are created. At the end of this process I found that commit_port_binding() is executed [7]. At this time neutron port status in the db is DOWN.
I found that at the end of commit_port_binding() [8] after neutron_lib.callbacks.registry notification is send the port status moves to UP. For ml2/OVN it stays DOWN. This is the first difference that I found between ml2/ovs and ml2/ovn.
After a bit digging I figured out how 'network-vif-plugged' is triggered in ml2/ovs.
Lets see how this is done.
1. On list of registered callbacks in ml2/ovs [8] we have configured
callback from class ovo_rpc._ObjectChangeHandler [9] and at the end of
commit_port_binding() this callback is used.
-------------------------------------------------------------
neutron.plugins.ml2.ovo_rpc._ObjectChangeHandler.handle_event
-------------------------------------------------------------
2. It is responsible for pushing new port object revisions to agents,
like:
----------------------------------------------------------------------------
Jun 24 10:01:01 test-migrate-1 neutron-server[3685]: DEBUG neutron.api.rpc.handlers.resources_rpc [None req-1430f349-d644-4d33-8833-90fad0124dcd service neutron] Pushing event updated for resources: {'Port': ['ID=3704a567-ef4c-4f6d-9557-a1191de07c4a,revision_number=10']} {{(pid=3697) push /opt/stack/neutron/neutron/api/rpc/handlers/resources_rpc.py:243}}
----------------------------------------------------------------------------
3. OVS agent consumes it and sends back RPC to the neutron server that port is actually UP (on source node!):
------------------------------------------------------------------------------------------------------------
Jun 24 10:01:01 test-migrate-1 neutron-openvswitch-agent[18660]: DEBUG neutron.agent.resource_cache [None req-1430f349-d644-4d33-8833-90fad0124dcd service neutron] Resource Port 3704a567-ef4c-4f6d-9557-a1191de07c4a updated (revision_number 8->10). Old fields: {'status': u'ACTIVE', 'bindings': [PortBinding(host='test-migrate-1',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={},status='INACTIVE',vif_details={"port_filter": true, "bridge_name": "br-int", "datapath_type": "system", "ovs_hybrid_plug": false},vif_type='ovs',vnic_type='normal'), PortBinding(host='test-migrate-2',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={"migrating_to": "test-migrate-1"},status='ACTIVE',vif_details={"port_filter": true, "bridge_name": "br-int", "datapath_type": "system", "ovs_hybrid_plug": false},vif_type='ovs',vnic_type='normal')], 'binding_levels': [PortBindingLevel(driver='openvswitch',host='test-migrate-1',level=0,port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,segment=NetworkSegment(c6866834-4577-497f-a6c8-ff9724a82e59),segment_id=c6866834-4577-497f-a6c8-ff9724a82e59), PortBindingLevel(driver='openvswitch',host='test-migrate-2',level=0,port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,segment=NetworkSegment(c6866834-4577-497f-a6c8-ff9724a82e59),segment_id=c6866834-4577-497f-a6c8-ff9724a82e59)]} New fields: {'status': u'DOWN', 'bindings': [PortBinding(host='test-migrate-1',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={},status='ACTIVE',vif_details={"port_filter": true, "bridge_name": "br-int", "datapath_type": "system", "ovs_hybrid_plug": false},vif_type='ovs',vnic_type='normal'), PortBinding(host='test-migrate-2',port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,profile={"migrating_to": "test-migrate-1"},status='INACTIVE',vif_details=None,vif_type='unbound',vnic_type='normal')], 'binding_levels': [PortBindingLevel(driver='openvswitch',host='test-migrate-1',level=0,port_id=3704a567-ef4c-4f6d-9557-a1191de07c4a,segment=NetworkSegment(c6866834-4577-497f-a6c8-ff9724a82e59),segment_id=c6866834-4577-497f-a6c8-ff9724a82e59)]} {{(pi
Jun 24 10:01:01 test-migrate-1 neutron-openvswitch-agent[18660]: d=18660) record_resource_update /opt/stack/neutron/neutron/agent/resource_cache.py:186}}
...
Jun 24 10:01:02 test-migrate-1 neutron-openvswitch-agent[18660]: DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-9daaf112-57f4-49bb-8390-4b65a5c5e674 None None] Setting status for 3704a567-ef4c-4f6d-9557-a1191de07c4a to UP {{(pid=18660) _bind_devices /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1088}}
------------------------------------------------------------------------------------------------------------
4. Neutron server consumes it:
------------------------------------------------------------------------------------------------------------
Jun 24 10:01:02 test-migrate-1 neutron-server[3685]: DEBUG neutron.plugins.ml2.rpc [None req-62e69669-fa7e-4f70-9e38-38cb3e2c30a7 None None] Device 3704a567-ef4c-4f6d-9557-a1191de07c4a up at agent ovs-agent-test-migrate-1 {{(pid=3698) update_device_up /opt/stack/neutron/neutron/plugins/ml2/rpc.py:269}}
...
Jun 24 10:01:02 test-migrate-1 neutron-server[3685]: DEBUG neutron.db.provisioning_blocks [None req-62e69669-fa7e-4f70-9e38-38cb3e2c30a7 None None] Provisioning for port 3704a567-ef4c-4f6d-9557-a1191de07c4a completed by entity L2. {{(pid=3698) provisioning_complete /opt/stack/neutron/neutron/db/provisioning_blocks.py:133}}
...
Jun 24 10:01:02 test-migrate-1 neutron-server[3685]: DEBUG neutron.db.provisioning_blocks [None req-62e69669-fa7e-4f70-9e38-38cb3e2c30a7 None None] Provisioning complete for port 3704a567-ef4c-4f6d-9557-a1191de07c4a triggered by entity L2. {{(pid=3698) provisioning_complete /opt/stack/neutron/neutron/db/provisioning_blocks.py:140}}
------------------------------------------------------------------------------------------------------------
and then generates internal event "PROVISIONING_COMPLETE" [10]. This
event is consumed by [11] and port_provisioned() updates port status
in the DB to UP [12]. At the end it emits notification 'network-vif-
plugged' and nova continues migration.
In ml2/ovn we don't have agents, so we don't use ovo_rpc. That's why migration for ml2/ovn doesn't work.
It looks like general bug somewhere between nova and neutron. Neutron shouldn't send notification 'network-vif-plug' during configuration of double binding from source host like it is now (paragraph 3.)
Maybe we could consider using some more sophisticated names, like 'neutron-vif-inactive-binding-set'?
Maybe nova could watch for inactive binding being created [13] and then start live-migration
instead waiting for neutron notification?
Thanks,
Maciej
[1] https://review.opendev.org/#/q/topic:bp/live-migration-portbinding+(status:open+OR+status:merged)
[2] https://review.opendev.org/#/c/558001/
[3] https://blueprints.launchpad.net/nova/+spec/neutron-new-port-binding-api
[4] https://review.opendev.org/#/c/635360/
[5] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/db/db_base_plugin_v2.py#L173
[6] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/notifiers/nova.py#L182
[7] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L505
[8] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L713
[9] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/ovo_rpc.py#L51
[10] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/db/provisioning_blocks.py#L140
[11] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L285
[12] https://github.com/openstack/neutron/blob/0e2508c8b1a3706a2ade0517f5c5359af2f8bc78/neutron/plugins/ml2/plugin.py#L316
[13] https://specs.openstack.org/openstack/neutron-specs/specs/backlog/pike/portbinding_information_for_nova.html#list-bindings
To manage notifications about this bug go to:
https://bugs.launchpad.net/networking-ovn/+bug/1834045/+subscriptions
References