← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1783470] Re: get_subnet_for_dvr returns SNAT mac instead of gateway in subnet_info

 

Reviewed:  https://review.openstack.org/587234
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c6de172e58ed4cbd157c2e560ffbbb4dc3a34730
Submitter: Zuul
Branch:    master

commit c6de172e58ed4cbd157c2e560ffbbb4dc3a34730
Author: Arjun Baindur <xagent@xxxxxxxxx>
Date:   Mon Jul 30 14:22:30 2018 -0700

    get_subnet_for_dvr returns SNAT mac instead of distributed gateway in subnet_info
    
    On hosts with dvr_snat agent mode, after restarting OVS agent,
    sometimes the SNAT port is processed first instead of the distributed port.
    The subnet_info is cached locally via get_subnet_for_dvr when either of these ports
    are processed. However, it returns the MAC address of the port used to query
    as the gateway for the subnet. Using the SNAT port, this puts the wrong
    MAC as the gateway, causing some flows such as the DVR flows on br-int
    for local src VMs to have the wrong MAC.
    
    This patch fixes the get_subnet_for_dvr with fixed_ips as None for the csnat port,
    as that causes the server side handler to fill in the subnet's actual gateway
    rather than using the port's MAC.
    
    Change-Id: If045851819fd53c3b9a1506cc52bc1757e6d6851
    Closes-Bug: #1783470


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1783470

Title:
  get_subnet_for_dvr returns SNAT mac instead of gateway in subnet_info

Status in neutron:
  Fix Released

Bug description:
  On our dvr_snat host, the "install_dvr_to_src_mac" is installing the
  rule in br-int with the SNAT MAC instead instead of the DVR mac
  address (subnet's gateway aka network:router_interface_distributed).
  For example, the subnet's gateway is 172.16.0.1, with MAC
  fa:16:3e:42:a2:ec.

  On most hosts, we see following rules in br-int:

  [root@stan ~]# ovs-ofctl dump-flows br-int | grep fa:16:3e:42:a2:ec
   cookie=0x77f69fee58f51737, duration=11872.801s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:22:eb:8b actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
   cookie=0x77f69fee58f51737, duration=11872.790s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:cd:71:e1 actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
   cookie=0x77f69fee58f51737, duration=11865.953s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:20:77:00 actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
   cookie=0x77f69fee58f51737, duration=11865.933s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:ab:2d:1a actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
   cookie=0x77f69fee58f51737, duration=11860.735s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:76:e9:ae actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
   cookie=0x77f69fee58f51737, duration=11859.335s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:cb:48:27 actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)

  
  However, on our dvr_snat host, these rules are all missing for the dl_src MAC. Instead, they get added with the MAC of the network:router_centralized_snat instead:

  root@krusty:~# ovs-ofctl dump-flows br-int | grep fa:16:3e:84:0b:42
   cookie=0xbb5ebbfa2dfadb74, duration=5351.368s, table=2, n_packets=2976001, n_bytes=362273213, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:84:0b:42 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
   cookie=0xbb5ebbfa2dfadb74, duration=5195.362s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:86:91:e2 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
   cookie=0xbb5ebbfa2dfadb74, duration=5195.349s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:a2:04:d3 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
   cookie=0xbb5ebbfa2dfadb74, duration=5195.336s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:82:ef:3b actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
   cookie=0xbb5ebbfa2dfadb74, duration=5195.325s, table=2, n_packets=24, n_bytes=2044, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:e4:d9:f3 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
   cookie=0xbb5ebbfa2dfadb74, duration=5195.272s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:b9:a0:fe actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
   cookie=0xbb5ebbfa2dfadb74, duration=5194.118s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:1a:42:fa actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
   cookie=0xbb5ebbfa2dfadb74, duration=5194.098s, table=2, n_packets=56, n_bytes=4792, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:84:33:df actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
   cookie=0xbb5ebbfa2dfadb74, duration=5193.995s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:34:e1:92 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
   cookie=0xbb5ebbfa2dfadb74, duration=5193.509s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:6d:3e:f3 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
   cookie=0xbb5ebbfa2dfadb74, duration=5191.408s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:30:97:8f actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
   cookie=0xbb5ebbfa2dfadb74, duration=5188.895s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:57:e5:ad actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
   cookie=0xbb5ebbfa2dfadb74, duration=5351.361s, table=60, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:84:0b:42 actions=strip_vlan,output:951
  root@krusty:~# 


  I have traced this to the get_subnet_for_dvr call. In the subnet_info,
  the gateway_mac returned is incorrect. Initially upon restarting OVS
  agent, the dvr_local_map is empty. So OVS agent makes the
  get_subnet_for_dvr call to populate local subnet info map. On good
  hosts, it is querying with fixed_ip = subnet gateway (172.16.0.1). On
  the snat host, it is querying first with fixed_ip = 172.16.0.3.

  Either this is incorrect, or even when querying with SNAT port, the
  gateway_mac in subnet should be DVR MAC, not snat MAC:

  Good host:

  root@barney:~# cat ovs.log | grep get_subnet_for_dvr | grep "172.16"
  2018-07-24 19:42:24.454 15840 DEBUG neutron.api.rpc.handlers.dvr_rpc [req-5ece05d6-f2cd-46a4-b81e-7e579e61990b - - - - -] neutron.api.rpc.handlers.dvr_rpc.DVRServerRpcApi method get_subnet_for_dvr called with arguments (<neutron_lib.context.ContextBase object at 0x7f52f1983150>, '3707b250-b6f5-4701-9b17-01a8f288c17a') {'fixed_ips': [{'subnet_id': '3707b250-b6f5-4701-9b17-01a8f288c17a', 'ip_address': '172.16.0.1'}]} wrapper /opt/pf9/pf9-neutron/lib/python2.7/site-packages/oslo_log/helpers.py:66
  2018-07-24 19:42:24.820 15840 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-5ece05d6-f2cd-46a4-b81e-7e579e61990b - - - - -] get_subnet_for_dvr for subnet 3707b250-b6f5-4701-9b17-01a8f288c17a returned with {u'shared': True, u'service_types': [], u'description': None, u'enable_dhcp': True, u'tags': [], u'network_id': u'3f6ec232-7649-4639-b828-c3af9960481b', u'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'dns_nameservers': [u'10.1.10.19', u'8.8.8.8', u'8.8.4.4'], u'gateway_ip': u'172.16.0.1', u'ipv6_ra_mode': None, u'allocation_pools': [{u'start': u'172.16.0.2', u'end': u'172.16.255.254'}], u'host_routes': [], u'revision_number': 2, u'ipv6_address_mode': None, u'ip_version': 4, u'gateway_mac': u'fa:16:3e:42:a2:ec', u'cidr': u'172.16.0.0/16', u'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'id': u'3707b250-b6f5-4701-9b17-01a8f288c17a', u'subnetpool_id': None, u'name': u'172.16.0.0/16'} _bind_distributed_router_interface_port /opt/pf9/pf9-neutron/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py:371
  2018-07-24 19:42:25.686 15840 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-5ece05d6-f2cd-46a4-b81e-7e579e61990b - - - - -] get_subnet_for_dvr for subnet 98d2750d-60ce-4b53-88ef-423b77d5f5f5 returned with {u'shared': True, u'service_types': [], u'description': None, u'enable_dhcp': True, u'tags': [], u'network_id': u'655c3eb4-b9f5-4e30-92de-2262d6e87c92', u'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'dns_nameservers': [], u'gateway_ip': u'10.100.0.1', u'ipv6_ra_mode': None, u'allocation_pools': [{u'start': u'10.100.0.2', u'end': u'10.100.255.254'}], u'host_routes': [{u'destination': u'0.0.0.0/0', u'nexthop': u'172.16.0.1'}], u'revision_number': 0, u'ipv6_address_mode': None, u'ip_version': 4, u'gateway_mac': u'fa:16:3e:13:61:98', u'cidr': u'10.100.0.0/16', u'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'id': u'98d2750d-60ce-4b53-88ef-423b77d5f5f5', u'subnetpool_id': None, u'name': u'dogfood-vxlan-8000-sub'} _bind_distributed_router_interface_port /opt/pf9/pf9-neutron/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py:371

  
  Bad Host:
  root@krusty:~# cat ovs.log | grep get_subnet_for_dvr | grep "172.16"
  2018-07-24 19:44:44.135 31138 DEBUG neutron.api.rpc.handlers.dvr_rpc [req-6d269f17-c49c-4f64-93f8-139639020c5d - - - - -] neutron.api.rpc.handlers.dvr_rpc.DVRServerRpcApi method get_subnet_for_dvr called with arguments (<neutron_lib.context.ContextBase object at 0x7f1c09d3b410>, '3707b250-b6f5-4701-9b17-01a8f288c17a') {'fixed_ips': [{'subnet_id': '3707b250-b6f5-4701-9b17-01a8f288c17a', 'ip_address': '172.16.0.3'}]} wrapper /opt/pf9/pf9-neutron/lib/python2.7/site-packages/oslo_log/helpers.py:66
  2018-07-24 19:44:44.369 31138 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-6d269f17-c49c-4f64-93f8-139639020c5d - - - - -] get_subnet_for_dvr for subnet 3707b250-b6f5-4701-9b17-01a8f288c17a returned with {u'shared': True, u'service_types': [], u'description': None, u'enable_dhcp': True, u'tags': [], u'network_id': u'3f6ec232-7649-4639-b828-c3af9960481b', u'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'dns_nameservers': [u'10.1.10.19', u'8.8.8.8', u'8.8.4.4'], u'gateway_ip': u'172.16.0.1', u'ipv6_ra_mode': None, u'allocation_pools': [{u'start': u'172.16.0.2', u'end': u'172.16.255.254'}], u'host_routes': [], u'revision_number': 2, u'ipv6_address_mode': None, u'ip_version': 4, u'gateway_mac': u'fa:16:3e:84:0b:42', u'cidr': u'172.16.0.0/16', u'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'id': u'3707b250-b6f5-4701-9b17-01a8f288c17a', u'subnetpool_id': None, u'name': u'172.16.0.0/16'} _bind_centralized_snat_port_on_dvr_subnet /opt/pf9/pf9-neutron/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py:553
  2018-07-24 19:44:51.786 31138 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-6d269f17-c49c-4f64-93f8-139639020c5d - - - - -] get_subnet_for_dvr for subnet 98d2750d-60ce-4b53-88ef-423b77d5f5f5 returned with {u'shared': True, u'service_types': [], u'description': None, u'enable_dhcp': True, u'tags': [], u'network_id': u'655c3eb4-b9f5-4e30-92de-2262d6e87c92', u'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'dns_nameservers': [], u'gateway_ip': u'10.100.0.1', u'ipv6_ra_mode': None, u'allocation_pools': [{u'start': u'10.100.0.2', u'end': u'10.100.255.254'}], u'host_routes': [{u'destination': u'0.0.0.0/0', u'nexthop': u'172.16.0.1'}], u'revision_number': 0, u'ipv6_address_mode': None, u'ip_version': 4, u'gateway_mac': u'fa:16:3e:b1:bd:33', u'cidr': u'10.100.0.0/16', u'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'id': u'98d2750d-60ce-4b53-88ef-423b77d5f5f5', u'subnetpool_id': None, u'name': u'dogfood-vxlan-8000-sub'} _bind_centralized_snat_port_on_dvr_subnet /opt/pf9/pf9-neutron/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py:553

  This causes a whole slew of problems - packets are sent into network
  infrastructure with src MAC of the local DVR mac, causing this MAC to
  flap on remote hosts' br-int between patch cable and qr interface. If
  we shut the snat host's interfaces or bring the host down, the dvr MAC
  stops flapping on br-int on other hosts, and network connectivity is
  restored.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1783470/+subscriptions


References