← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1807239] Re: Race condition with DPDK + trunk ports when instance port is deleted then quickly recreated

 

Reviewed:  https://review.openstack.org/623275
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=bd2a1bc6c3d68ddd27fc0b77ec767b11ec0fcb54
Submitter: Zuul
Branch:    master

commit bd2a1bc6c3d68ddd27fc0b77ec767b11ec0fcb54
Author: Nate Johnston <nate.johnston@xxxxxxxxxx>
Date:   Thu Dec 6 12:39:49 2018 -0500

    Do not delete trunk bridges if service port attached
    
    When a deployment has instance ports that are neutron trunk ports with
    DPDK vhu in vhostuserclient mode, when the instance reboots nova will
    delete the ovs port and then recreate when the host comes back from
    reboot.  This quick transition change can trigger a race condition that
    causes the tbr trunk bridge to be deleted after the port has been
    recreated.  See the bug for more details.
    
    This change mitigates the race condition by adding a check for active
    service ports within the trunk port deletion function.
    
    Change-Id: I70b9c26990e6902f8888449bfd7483c25e5bff46
    Closes-Bug: #1807239


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1807239

Title:
  Race condition with DPDK + trunk ports when instance port is deleted
  then quickly recreated

Status in neutron:
  Fix Released

Bug description:
  Deployment is Queens with ML2/OVS and DPDK. Instance ports are neutron
  trunk ports with DPDK vhu in vhostuserclient mode.  When an instance
  is rebooted, nova/os_vif deletes the ovs port connected to the trunk
  bridge and then recreates it when the host comes back online.  This
  causes a race condition in the trunk code that handles the tbr trunk
  bridge.  Neutron, seeing the deletion, queues a delete of the tbr
  bridge.  Then the subsequent re-add of the trunk is ignored because
  the delete is still enqueued.

  Here are annotated logs for an instance of the issue from the logfiles
  of nova-compute and neutron-openvswitch-agent:

  # (1) nova-compute deletes the instance's ovs port
  2018-11-30 21:03:17.576 5758 DEBUG oslo.privsep.daemon [req-0387f343-92e4-4b0c-b43b-a58d26fafef7 cf1b5aad77aa40a6933003b69f4cba43 a830497f545440efbc007a9607f4fd89 - default default] privsep: request[140038488344528]: (3, 'vif_plug_ovs.linux_net.delete_ovs_vif_port', (u'tbr-f4554113-b', u'vhu44d242e1-30'), {'timeout': 120}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443

  # (2) ovsdb-monitor sees the deletion of the ovs port in (1)
  2018-11-30 21:03:17.587 61270 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["4d9cbba1-fe43-4703-9735-1eb61e763b29","delete","vhu44d242e1-30",1,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["bridge_name","tbr-f4554113-b"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["subport_ids","[\"605ebd34-0ea4-44a6-aa5b-ce80878b1847\", \"7df8776f-4e15-4c74-bc58-0eff1bffd868\", \"e7476312-fcca-4ad8-886e-2093de1669f5\"]"],["trunk_id","f4554113-b695-4e7d-b550-6927765c6679"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

  # (3) nova-compute creates a fresh ovs port for the instance
  2018-11-30 21:03:18.535 5758 DEBUG oslo.privsep.daemon [req-0387f343-92e4-4b0c-b43b-a58d26fafef7 cf1b5aad77aa40a6933003b69f4cba43 a830497f545440efbc007a9607f4fd89 - default default] privsep: request[140038488344528]: (3, 'vif_plug_ovs.linux_net.create_ovs_vif_port', (u'tbr-f4554113-b', u'vhu44d242e1-30', u'44d242e1-30aa-4da6-b00c-bad9d64560af', u'fa:16:3e:3e:4a:b4', u'1f23fafb-e370-4e53-9bf1-8d5b1d8f7387', 9000), {'interface_type': 'dpdkvhostuserclient', 'vhost_server_path': u'/var/lib/vhost_sockets/vhu44d242e1-30', 'timeout': 120}) loop /usr/lib/python2.7/site-packages/oslo_privsep/daemon.py:443

  # (4) ovsdb-monitor sees the creation of the ovs port in (3)
  2018-11-30 21:03:19.054 61270 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["fddedd5b-73e8-4e54-ba2a-97364f27950d","old",null,null,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]],["","new","vhu44d242e1-30",6,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["bridge_name","tbr-f4554113-b"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["subport_ids","[]"],["trunk_id","f4554113-b695-4e7d-b550-6927765c6679"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

  # (5) neutron-openvswitch-agent reacts to the deletion in (2) by deleting the trunk bridge
  2018-11-30 21:03:19.093 61270 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): DelBridgeCommand(if_exists=True, name=tbr-f4554113-b) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84

  # (6) ovsdb-monitor reports the deletion of the trunk bridge from (5)
  2018-11-30 21:03:19.097 61270 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["8370c94d-6920-47f7-811a-51702eee4027","delete","tbr-f4554113-b",65534,["map",[]]],["1f9b6a11-32e5-44bf-a6d3-e6750e087c33","delete","tpt-44d242e1-30",2,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"]]]],["9951c13a-6bfb-45ba-8499-68fa399d1e59","delete","tpi-44d242e1-30",39,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"]]]],["fddedd5b-73e8-4e54-ba2a-97364f27950d","delete","vhu44d242e1-30",6,["map",[["attached-mac","fa:16:3e:3e:4a:b4"],["bridge_name","tbr-f4554113-b"],["iface-id","44d242e1-30aa-4da6-b00c-bad9d64560af"],["iface-status","active"],["subport_ids","[]"],["trunk_id","f4554113-b695-4e7d-b550-6927765c6679"],["vm-uuid","1f23fafb-e370-4e53-9bf1-8d5b1d8f7387"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

  The creation event detected in step (4) is lost.  When the race
  condition does not happen, the neutron-openvswitch-agent logs will
  include a "Processing trunk bridge" log line; this does not occur when
  the race condition is triggered.

  Another indication of the issue is that you will get tracebacks
  looking for metadata on the deleted bridges:

  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command [-] Error executing command: RowNotFound: Cannot find Bridge with name=tbr-f4554113-b
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 35, in execute
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     txn.add(self)
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     self.gen.next()
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 94, in transaction
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     self._nested_txn = None
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 54, in __exit__
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     self.result = self.commit()
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command     raise result.ex
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command RowNotFound: Cannot find Bridge with name=tbr-f4554113-b
  2018-11-30 21:00:22.444 61270 ERROR ovsdbapp.backend.ovs_idl.command
  2018-11-30 21:00:22.444 61270 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Failed to store metadata for trunk f4554113-b695-4e7d-b550-6927765c6679: Cannot find Bridge with name=tbr-f4554113-b: RowNotFound: Cannot find Bridge with name=tbr-f4554113-b

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1807239/+subscriptions


References