← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1887148] Re: Network loop between physical networks with DVR

 

Reviewed:  https://review.opendev.org/740724
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c1a77ef8b74bb9b5abbc5cb03fb3201383122eb8
Submitter: Zuul
Branch:    master

commit c1a77ef8b74bb9b5abbc5cb03fb3201383122eb8
Author: Darragh O'Reilly <doreilly@xxxxxxxx>
Date:   Mon Jul 13 14:48:10 2020 +0000

    Ensure drop flows on br-int at agent startup for DVR too
    
    Commit 90212b12 changed the OVS agent so adding vital drop flows on
    br-int (table 0 priority 2) for packets from physical bridges was
    deferred until DVR initialization later on. But if br-int has no flows
    from a previous run (eg after host reboot), then these packets will hit
    the NORMAL flow in table 60. And if there is more than one physical
    bridge, then the physical interfaces from the different bridges are now
    essentially connected at layer 2 and a network loop is possible in the
    time before the flows are added by DVR. Also the DVR code won't add them
    until after RPC calls to the server, so a loop is more likely if the
    server is not available.
    
    This patch restores adding these flows to when the physical bridges are
    first configured. Also updated a comment that was no longer correct and
    updated the unit test.
    
    Change-Id: I42c33fefaae6a7bee134779c840f35632823472e
    Closes-Bug: #1887148
    Related-Bug: #1869808


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1887148

Title:
  Network loop between physical networks with DVR

Status in neutron:
  Fix Released

Bug description:
  Our CI experienced a network loop due to
  https://review.opendev.org/#/c/733568/ . DVR is enabled and there is
  more than one physical bridge mapping, and the neutron server was not
  available when the ovs agents were started.

  Steps
  =====
  # add more physical bridges
  ovs-vsctl add-br br-physnet1
  ip link set dev br-physnet1 up

  ovs-vsctl add-br br-physnet2
  ip link set dev br-physnet2 up

  # set a broadcast going from one bridge
  ip address add 1.1.1.1/31 dev br-physnet1
  arping -b -I br-physnet1 1.1.1.1

  # listen on the other
  tcpdump -eni br-physnet2

  # Update /etc/neutron/plugins/ml2/ml2_conf.ini
  [ml2_type_vlan]
  network_vlan_ranges = public,physnet1,physnet2

  [ovs]
  datapath_type = system
  bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
  tunnel_bridge = br-tun
  local_ip = 127.0.0.1

  [agent]
  tunnel_types = vxlan
  root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
  root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
  enable_distributed_routing = True
  l2_population = True

  # stop server and agent
  systemctl stop devstack@q-svc
  systemctl stop devstack@q-agt

  # clear all flows
  for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done

  # start agent
  systemctl start devstack@q-agt

  $ sudo tcpdump -eni br-physnet2
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes
  09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
  09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28
  ...

  If there is more than one node running the ovs agent in this state,
  then there will be a network loop and packets can multiple quickly and
  overwhelm the network. We saw ~1 million packets/sec.

  I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed:
  https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
  https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1887148/+subscriptions


References