← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1807239] [NEW] Race condition with DPDK + trunk ports when instance port is deleted then quickly recreated

 

Public bug reported:

Deployment is Queens with ML2/OVS and DPDK. Instance ports are neutron
trunk ports with DPDK vhu in vhostuserclient mode.  When an instance is
rebooted, nova/os_vif deletes the ovs port connected to the trunk bridge
and then recreates it when the host comes back online.  This causes a
race condition in the trunk code that handles the tbr trunk bridge.
Neutron, seeing the deletion, queues a delete of the tbr bridge.  Then
the subsequent re-add of the trunk is ignored because the delete is
still enqueued.

Here are annotated logs for an instance of the issue from the logfiles
of nova-compute and neutron-openvswitch-agent:

# (1) nova-compute deletes the instance's ovs port
2018-11-30 21:03:17.576 5758 DEBUG oslo.privsep.daemon [req-0387f343-92e4-4b0c-b43b-a58d26fafef7 cf1b5aad77aa40a6933003b69f4cba43 a830497f545440efbc007a9607f4fd89 - default default] privsep: request[140038488344528]: (3, 'vif_plug_ovs.linux_net.delete_ovs_vif_port', (u'tbr-f4554113-b', u'vhu44d242e1-30'), {'timeout': 120}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443

# (2) ovsdb-monitor sees the deletion of the ovs port in (1)
2018-11-30 21:03:17.587 61270 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["4d9cbba1-fe43-4703-9735-1eb61e763b29","delete","vhu44d242e1-30",1,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["bridge_name","tbr-f4554113-b"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["subport_ids","[\"605ebd34-0ea4-44a6-aa5b-ce80878b1847\", \"7df8776f-4e15-4c74-bc58-0eff1bffd868\", \"e7476312-fcca-4ad8-886e-2093de1669f5\"]"],["trunk_id","f4554113-b695-4e7d-b550-6927765c6679"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

# (3) nova-compute creates a fresh ovs port for the instance
2018-11-30 21:03:18.535 5758 DEBUG oslo.privsep.daemon [req-0387f343-92e4-4b0c-b43b-a58d26fafef7 cf1b5aad77aa40a6933003b69f4cba43 a830497f545440efbc007a9607f4fd89 - default default] privsep: request[140038488344528]: (3, 'vif_plug_ovs.linux_net.create_ovs_vif_port', (u'tbr-f4554113-b', u'vhu44d242e1-30', u'44d242e1-30aa-4da6-b00c-bad9d64560af', u'fa:16:3e:3e:4a:b4', u'1f23fafb-e370-4e53-9bf1-8d5b1d8f7387', 9000), {'interface_type': 'dpdkvhostuserclient', 'vhost_server_path': u'/var/lib/vhost_sockets/vhu44d242e1-30', 'timeout': 120}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443

# (4) ovsdb-monitor sees the creation of the ovs port in (3)
2018-11-30 21:03:19.054 61270 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["fddedd5b-73e8-4e54-ba2a-97364f27950d","old",null,null,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]],["","new","vhu44d242e1-30",6,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["bridge_name","tbr-f4554113-b"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["subport_ids","[]"],["trunk_id","f4554113-b695-4e7d-b550-6927765c6679"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

# (5) neutron-openvswitch-agent reacts to the deletion in (2) by deleting the trunk bridge
2018-11-30 21:03:19.093 61270 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): DelBridgeCommand(if_exists=True, name=tbr-f4554113-b) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84

# (6) ovsdb-monitor reports the deletion of the trunk bridge from (5)
2018-11-30 21:03:19.097 61270 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["8370c94d-6920-47f7-811a-51702eee4027","delete","tbr-f4554113-b",65534,["map",[]]],["1f9b6a11-32e5-44bf-a6d3-e6750e087c33","delete","tpt-44d242e1-30",2,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"]]]],["9951c13a-6bfb-45ba-8499-68fa399d1e59","delete","tpi-44d242e1-30",39,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"]]]],["fddedd5b-73e8-4e54-ba2a-97364f27950d","delete","vhu44d242e1-30",6,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["bridge_name","tbr-f4554113-b"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["subport_ids","[]"],["trunk_id","f4554113-b695-4e7d-b550-6927765c6679"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

The creation event detected in step (4) is lost.  When the race
condition does not happen, the neutron-openvswitch-agent logs will
include a "Processing trunk bridge" log line; this does not occur when
the race condition is triggered.

Another indication of the issue is that you will get tracebacks looking
for metadata on the deleted bridges:

2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command [-] Error executing command: RowNotFound: Cannot find Bridge with name=tbr-f4554113-b
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 35, in execute
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     txn.add(self)
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     self.gen.next()
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 94, in transaction
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     self._nested_txn = None
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 54, in __exit__
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     self.result = self.commit()
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     raise result.ex
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command RowNotFound: Cannot find Bridge with name=tbr-f4554113-b
2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command
2018-11-30 21:00:22.444 61270 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Failed to store metadata for trunk f4554113-b695-4e7d-b550-6927765c6679: Cannot find Bridge with name=tbr-f4554113-b: RowNotFound: Cannot find Bridge with name=tbr-f4554113-b

** Affects: neutron
     Importance: Critical
     Assignee: Nate Johnston (nate-johnston)
         Status: In Progress


** Tags: dpdk queens-backport-potential rocky-backport-potential trunk

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1807239

Title:
  Race condition with DPDK + trunk ports when instance port is deleted
  then quickly recreated

Status in neutron:
  In Progress

Bug description:
  Deployment is Queens with ML2/OVS and DPDK. Instance ports are neutron
  trunk ports with DPDK vhu in vhostuserclient mode.  When an instance
  is rebooted, nova/os_vif deletes the ovs port connected to the trunk
  bridge and then recreates it when the host comes back online.  This
  causes a race condition in the trunk code that handles the tbr trunk
  bridge.  Neutron, seeing the deletion, queues a delete of the tbr
  bridge.  Then the subsequent re-add of the trunk is ignored because
  the delete is still enqueued.

  Here are annotated logs for an instance of the issue from the logfiles
  of nova-compute and neutron-openvswitch-agent:

  # (1) nova-compute deletes the instance's ovs port
  2018-11-30 21:03:17.576 5758 DEBUG oslo.privsep.daemon [req-0387f343-92e4-4b0c-b43b-a58d26fafef7 cf1b5aad77aa40a6933003b69f4cba43 a830497f545440efbc007a9607f4fd89 - default default] privsep: request[140038488344528]: (3, 'vif_plug_ovs.linux_net.delete_ovs_vif_port', (u'tbr-f4554113-b', u'vhu44d242e1-30'), {'timeout': 120}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443

  # (2) ovsdb-monitor sees the deletion of the ovs port in (1)
  2018-11-30 21:03:17.587 61270 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["4d9cbba1-fe43-4703-9735-1eb61e763b29","delete","vhu44d242e1-30",1,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["bridge_name","tbr-f4554113-b"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["subport_ids","[\"605ebd34-0ea4-44a6-aa5b-ce80878b1847\", \"7df8776f-4e15-4c74-bc58-0eff1bffd868\", \"e7476312-fcca-4ad8-886e-2093de1669f5\"]"],["trunk_id","f4554113-b695-4e7d-b550-6927765c6679"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

  # (3) nova-compute creates a fresh ovs port for the instance
  2018-11-30 21:03:18.535 5758 DEBUG oslo.privsep.daemon [req-0387f343-92e4-4b0c-b43b-a58d26fafef7 cf1b5aad77aa40a6933003b69f4cba43 a830497f545440efbc007a9607f4fd89 - default default] privsep: request[140038488344528]: (3, 'vif_plug_ovs.linux_net.create_ovs_vif_port', (u'tbr-f4554113-b', u'vhu44d242e1-30', u'44d242e1-30aa-4da6-b00c-bad9d64560af', u'fa:16:3e:3e:4a:b4', u'1f23fafb-e370-4e53-9bf1-8d5b1d8f7387', 9000), {'interface_type': 'dpdkvhostuserclient', 'vhost_server_path': u'/var/lib/vhost_sockets/vhu44d242e1-30', 'timeout': 120}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443

  # (4) ovsdb-monitor sees the creation of the ovs port in (3)
  2018-11-30 21:03:19.054 61270 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["fddedd5b-73e8-4e54-ba2a-97364f27950d","old",null,null,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]],["","new","vhu44d242e1-30",6,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["bridge_name","tbr-f4554113-b"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["subport_ids","[]"],["trunk_id","f4554113-b695-4e7d-b550-6927765c6679"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

  # (5) neutron-openvswitch-agent reacts to the deletion in (2) by deleting the trunk bridge
  2018-11-30 21:03:19.093 61270 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): DelBridgeCommand(if_exists=True, name=tbr-f4554113-b) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84

  # (6) ovsdb-monitor reports the deletion of the trunk bridge from (5)
  2018-11-30 21:03:19.097 61270 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["8370c94d-6920-47f7-811a-51702eee4027","delete","tbr-f4554113-b",65534,["map",[]]],["1f9b6a11-32e5-44bf-a6d3-e6750e087c33","delete","tpt-44d242e1-30",2,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"]]]],["9951c13a-6bfb-45ba-8499-68fa399d1e59","delete","tpi-44d242e1-30",39,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"]]]],["fddedd5b-73e8-4e54-ba2a-97364f27950d","delete","vhu44d242e1-30",6,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["bridge_name","tbr-f4554113-b"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["subport_ids","[]"],["trunk_id","f4554113-b695-4e7d-b550-6927765c6679"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

  The creation event detected in step (4) is lost.  When the race
  condition does not happen, the neutron-openvswitch-agent logs will
  include a "Processing trunk bridge" log line; this does not occur when
  the race condition is triggered.

  Another indication of the issue is that you will get tracebacks
  looking for metadata on the deleted bridges:

  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command [-] Error executing command: RowNotFound: Cannot find Bridge with name=tbr-f4554113-b
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 35, in execute
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     txn.add(self)
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     self.gen.next()
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 94, in transaction
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     self._nested_txn = None
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 54, in __exit__
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     self.result = self.commit()
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     raise result.ex
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command RowNotFound: Cannot find Bridge with name=tbr-f4554113-b
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command
  2018-11-30 21:00:22.444 61270 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Failed to store metadata for trunk f4554113-b695-4e7d-b550-6927765c6679: Cannot find Bridge with name=tbr-f4554113-b: RowNotFound: Cannot find Bridge with name=tbr-f4554113-b

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1807239/+subscriptions


Follow ups