yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #76338
[Bug 1807239] Re: Race condition with DPDK + trunk ports when instance port is deleted then quickly recreated
Reviewed: https://review.openstack.org/623275
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=bd2a1bc6c3d68ddd27fc0b77ec767b11ec0fcb54
Submitter: Zuul
Branch: master
commit bd2a1bc6c3d68ddd27fc0b77ec767b11ec0fcb54
Author: Nate Johnston <nate.johnston@xxxxxxxxxx>
Date: Thu Dec 6 12:39:49 2018 -0500
Do not delete trunk bridges if service port attached
When a deployment has instance ports that are neutron trunk ports with
DPDK vhu in vhostuserclient mode, when the instance reboots nova will
delete the ovs port and then recreate when the host comes back from
reboot. This quick transition change can trigger a race condition that
causes the tbr trunk bridge to be deleted after the port has been
recreated. See the bug for more details.
This change mitigates the race condition by adding a check for active
service ports within the trunk port deletion function.
Change-Id: I70b9c26990e6902f8888449bfd7483c25e5bff46
Closes-Bug: #1807239
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1807239
Title:
Race condition with DPDK + trunk ports when instance port is deleted
then quickly recreated
Status in neutron:
Fix Released
Bug description:
Deployment is Queens with ML2/OVS and DPDK. Instance ports are neutron
trunk ports with DPDK vhu in vhostuserclient mode. When an instance
is rebooted, nova/os_vif deletes the ovs port connected to the trunk
bridge and then recreates it when the host comes back online. This
causes a race condition in the trunk code that handles the tbr trunk
bridge. Neutron, seeing the deletion, queues a delete of the tbr
bridge. Then the subsequent re-add of the trunk is ignored because
the delete is still enqueued.
Here are annotated logs for an instance of the issue from the logfiles
of nova-compute and neutron-openvswitch-agent:
# (1) nova-compute deletes the instance's ovs port
2018-11-30 21:03:17.576 5758 DEBUG oslo.privsep.daemon [req-0387f343-92e4-4b0c-b43b-a58d26fafef7 cf1b5aad77aa40a6933003b69f4cba43 a830497f545440efbc007a9607f4fd89 - default default] privsep: request[140038488344528]: (3, 'vif_plug_ovs.linux_net.delete_ovs_vif_port', (u'tbr-f4554113-b', u'vhu44d242e1-30'), {'timeout': 120}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443
# (2) ovsdb-monitor sees the deletion of the ovs port in (1)
2018-11-30 21:03:17.587 61270 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["4d9cbba1-fe43-4703-9735-1eb61e763b29","delete","vhu44d242e1-30",1,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["bridge_name","tbr-f4554113-b"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["subport_ids","[\"605ebd34-0ea4-44a6-aa5b-ce80878b1847\", \"7df8776f-4e15-4c74-bc58-0eff1bffd868\", \"e7476312-fcca-4ad8-886e-2093de1669f5\"]"],["trunk_id","f4554113-b695-4e7d-b550-6927765c6679"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239
# (3) nova-compute creates a fresh ovs port for the instance
2018-11-30 21:03:18.535 5758 DEBUG oslo.privsep.daemon [req-0387f343-92e4-4b0c-b43b-a58d26fafef7 cf1b5aad77aa40a6933003b69f4cba43 a830497f545440efbc007a9607f4fd89 - default default] privsep: request[140038488344528]: (3, 'vif_plug_ovs.linux_net.create_ovs_vif_port', (u'tbr-f4554113-b', u'vhu44d242e1-30', u'44d242e1-30aa-4da6-b00c-bad9d64560af', u'fa:16:3e:3e:4a:b4', u'1f23fafb-e370-4e53-9bf1-8d5b1d8f7387', 9000), {'interface_type': 'dpdkvhostuserclient', 'vhost_server_path': u'/var/lib/vhost_sockets/vhu44d242e1-30', 'timeout': 120}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443
# (4) ovsdb-monitor sees the creation of the ovs port in (3)
2018-11-30 21:03:19.054 61270 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["fddedd5b-73e8-4e54-ba2a-97364f27950d","old",null,null,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]],["","new","vhu44d242e1-30",6,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["bridge_name","tbr-f4554113-b"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["subport_ids","[]"],["trunk_id","f4554113-b695-4e7d-b550-6927765c6679"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239
# (5) neutron-openvswitch-agent reacts to the deletion in (2) by deleting the trunk bridge
2018-11-30 21:03:19.093 61270 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): DelBridgeCommand(if_exists=True, name=tbr-f4554113-b) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
# (6) ovsdb-monitor reports the deletion of the trunk bridge from (5)
2018-11-30 21:03:19.097 61270 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["8370c94d-6920-47f7-811a-51702eee4027","delete","tbr-f4554113-b",65534,["map",[]]],["1f9b6a11-32e5-44bf-a6d3-e6750e087c33","delete","tpt-44d242e1-30",2,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"]]]],["9951c13a-6bfb-45ba-8499-68fa399d1e59","delete","tpi-44d242e1-30",39,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"]]]],["fddedd5b-73e8-4e54-ba2a-97364f27950d","delete","vhu44d242e1-30",6,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["bridge_name","tbr-f4554113-b"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["subport_ids","[]"],["trunk_id","f4554113-b695-4e7d-b550-6927765c6679"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239
The creation event detected in step (4) is lost. When the race
condition does not happen, the neutron-openvswitch-agent logs will
include a "Processing trunk bridge" log line; this does not occur when
the race condition is triggered.
Another indication of the issue is that you will get tracebacks
looking for metadata on the deleted bridges:
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command [-] Error executing command: RowNotFound: Cannot find Bridge with name=tbr-f4554113-b
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 35, in execute
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command txn.add(self)
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command self.gen.next()
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 94, in transaction
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command self._nested_txn = None
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 54, in __exit__
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command self.result = self.commit()
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command raise result.ex
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command RowNotFound: Cannot find Bridge with name=tbr-f4554113-b
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command
2018-11-30 21:00:22.444 61270 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Failed to store metadata for trunk f4554113-b695-4e7d-b550-6927765c6679: Cannot find Bridge with name=tbr-f4554113-b: RowNotFound: Cannot find Bridge with name=tbr-f4554113-b
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1807239/+subscriptions
References