yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #74279
[Bug 1783470] Re: get_subnet_for_dvr returns SNAT mac instead of gateway in subnet_info
Reviewed: https://review.openstack.org/587234
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c6de172e58ed4cbd157c2e560ffbbb4dc3a34730
Submitter: Zuul
Branch: master
commit c6de172e58ed4cbd157c2e560ffbbb4dc3a34730
Author: Arjun Baindur <xagent@xxxxxxxxx>
Date: Mon Jul 30 14:22:30 2018 -0700
get_subnet_for_dvr returns SNAT mac instead of distributed gateway in subnet_info
On hosts with dvr_snat agent mode, after restarting OVS agent,
sometimes the SNAT port is processed first instead of the distributed port.
The subnet_info is cached locally via get_subnet_for_dvr when either of these ports
are processed. However, it returns the MAC address of the port used to query
as the gateway for the subnet. Using the SNAT port, this puts the wrong
MAC as the gateway, causing some flows such as the DVR flows on br-int
for local src VMs to have the wrong MAC.
This patch fixes the get_subnet_for_dvr with fixed_ips as None for the csnat port,
as that causes the server side handler to fill in the subnet's actual gateway
rather than using the port's MAC.
Change-Id: If045851819fd53c3b9a1506cc52bc1757e6d6851
Closes-Bug: #1783470
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1783470
Title:
get_subnet_for_dvr returns SNAT mac instead of gateway in subnet_info
Status in neutron:
Fix Released
Bug description:
On our dvr_snat host, the "install_dvr_to_src_mac" is installing the
rule in br-int with the SNAT MAC instead instead of the DVR mac
address (subnet's gateway aka network:router_interface_distributed).
For example, the subnet's gateway is 172.16.0.1, with MAC
fa:16:3e:42:a2:ec.
On most hosts, we see following rules in br-int:
[root@stan ~]# ovs-ofctl dump-flows br-int | grep fa:16:3e:42:a2:ec
cookie=0x77f69fee58f51737, duration=11872.801s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:22:eb:8b actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
cookie=0x77f69fee58f51737, duration=11872.790s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:cd:71:e1 actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
cookie=0x77f69fee58f51737, duration=11865.953s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:20:77:00 actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
cookie=0x77f69fee58f51737, duration=11865.933s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:ab:2d:1a actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
cookie=0x77f69fee58f51737, duration=11860.735s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:76:e9:ae actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
cookie=0x77f69fee58f51737, duration=11859.335s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:cb:48:27 actions=mod_dl_src:fa:16:3e:42:a2:ec,resubmit(,60)
However, on our dvr_snat host, these rules are all missing for the dl_src MAC. Instead, they get added with the MAC of the network:router_centralized_snat instead:
root@krusty:~# ovs-ofctl dump-flows br-int | grep fa:16:3e:84:0b:42
cookie=0xbb5ebbfa2dfadb74, duration=5351.368s, table=2, n_packets=2976001, n_bytes=362273213, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:84:0b:42 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
cookie=0xbb5ebbfa2dfadb74, duration=5195.362s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:86:91:e2 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
cookie=0xbb5ebbfa2dfadb74, duration=5195.349s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:a2:04:d3 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
cookie=0xbb5ebbfa2dfadb74, duration=5195.336s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:82:ef:3b actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
cookie=0xbb5ebbfa2dfadb74, duration=5195.325s, table=2, n_packets=24, n_bytes=2044, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:e4:d9:f3 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
cookie=0xbb5ebbfa2dfadb74, duration=5195.272s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:b9:a0:fe actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
cookie=0xbb5ebbfa2dfadb74, duration=5194.118s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:1a:42:fa actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
cookie=0xbb5ebbfa2dfadb74, duration=5194.098s, table=2, n_packets=56, n_bytes=4792, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:84:33:df actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
cookie=0xbb5ebbfa2dfadb74, duration=5193.995s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:34:e1:92 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
cookie=0xbb5ebbfa2dfadb74, duration=5193.509s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:6d:3e:f3 actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
cookie=0xbb5ebbfa2dfadb74, duration=5191.408s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:30:97:8f actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
cookie=0xbb5ebbfa2dfadb74, duration=5188.895s, table=2, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:57:e5:ad actions=mod_dl_src:fa:16:3e:84:0b:42,resubmit(,60)
cookie=0xbb5ebbfa2dfadb74, duration=5351.361s, table=60, n_packets=0, n_bytes=0, idle_age=65534, priority=4,dl_vlan=795,dl_dst=fa:16:3e:84:0b:42 actions=strip_vlan,output:951
root@krusty:~#
I have traced this to the get_subnet_for_dvr call. In the subnet_info,
the gateway_mac returned is incorrect. Initially upon restarting OVS
agent, the dvr_local_map is empty. So OVS agent makes the
get_subnet_for_dvr call to populate local subnet info map. On good
hosts, it is querying with fixed_ip = subnet gateway (172.16.0.1). On
the snat host, it is querying first with fixed_ip = 172.16.0.3.
Either this is incorrect, or even when querying with SNAT port, the
gateway_mac in subnet should be DVR MAC, not snat MAC:
Good host:
root@barney:~# cat ovs.log | grep get_subnet_for_dvr | grep "172.16"
2018-07-24 19:42:24.454 15840 DEBUG neutron.api.rpc.handlers.dvr_rpc [req-5ece05d6-f2cd-46a4-b81e-7e579e61990b - - - - -] neutron.api.rpc.handlers.dvr_rpc.DVRServerRpcApi method get_subnet_for_dvr called with arguments (<neutron_lib.context.ContextBase object at 0x7f52f1983150>, '3707b250-b6f5-4701-9b17-01a8f288c17a') {'fixed_ips': [{'subnet_id': '3707b250-b6f5-4701-9b17-01a8f288c17a', 'ip_address': '172.16.0.1'}]} wrapper /opt/pf9/pf9-neutron/lib/python2.7/site-packages/oslo_log/helpers.py:66
2018-07-24 19:42:24.820 15840 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-5ece05d6-f2cd-46a4-b81e-7e579e61990b - - - - -] get_subnet_for_dvr for subnet 3707b250-b6f5-4701-9b17-01a8f288c17a returned with {u'shared': True, u'service_types': [], u'description': None, u'enable_dhcp': True, u'tags': [], u'network_id': u'3f6ec232-7649-4639-b828-c3af9960481b', u'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'dns_nameservers': [u'10.1.10.19', u'8.8.8.8', u'8.8.4.4'], u'gateway_ip': u'172.16.0.1', u'ipv6_ra_mode': None, u'allocation_pools': [{u'start': u'172.16.0.2', u'end': u'172.16.255.254'}], u'host_routes': [], u'revision_number': 2, u'ipv6_address_mode': None, u'ip_version': 4, u'gateway_mac': u'fa:16:3e:42:a2:ec', u'cidr': u'172.16.0.0/16', u'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'id': u'3707b250-b6f5-4701-9b17-01a8f288c17a', u'subnetpool_id': None, u'name': u'172.16.0.0/16'} _bind_distributed_router_interface_port /opt/pf9/pf9-neutron/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py:371
2018-07-24 19:42:25.686 15840 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-5ece05d6-f2cd-46a4-b81e-7e579e61990b - - - - -] get_subnet_for_dvr for subnet 98d2750d-60ce-4b53-88ef-423b77d5f5f5 returned with {u'shared': True, u'service_types': [], u'description': None, u'enable_dhcp': True, u'tags': [], u'network_id': u'655c3eb4-b9f5-4e30-92de-2262d6e87c92', u'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'dns_nameservers': [], u'gateway_ip': u'10.100.0.1', u'ipv6_ra_mode': None, u'allocation_pools': [{u'start': u'10.100.0.2', u'end': u'10.100.255.254'}], u'host_routes': [{u'destination': u'0.0.0.0/0', u'nexthop': u'172.16.0.1'}], u'revision_number': 0, u'ipv6_address_mode': None, u'ip_version': 4, u'gateway_mac': u'fa:16:3e:13:61:98', u'cidr': u'10.100.0.0/16', u'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'id': u'98d2750d-60ce-4b53-88ef-423b77d5f5f5', u'subnetpool_id': None, u'name': u'dogfood-vxlan-8000-sub'} _bind_distributed_router_interface_port /opt/pf9/pf9-neutron/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py:371
Bad Host:
root@krusty:~# cat ovs.log | grep get_subnet_for_dvr | grep "172.16"
2018-07-24 19:44:44.135 31138 DEBUG neutron.api.rpc.handlers.dvr_rpc [req-6d269f17-c49c-4f64-93f8-139639020c5d - - - - -] neutron.api.rpc.handlers.dvr_rpc.DVRServerRpcApi method get_subnet_for_dvr called with arguments (<neutron_lib.context.ContextBase object at 0x7f1c09d3b410>, '3707b250-b6f5-4701-9b17-01a8f288c17a') {'fixed_ips': [{'subnet_id': '3707b250-b6f5-4701-9b17-01a8f288c17a', 'ip_address': '172.16.0.3'}]} wrapper /opt/pf9/pf9-neutron/lib/python2.7/site-packages/oslo_log/helpers.py:66
2018-07-24 19:44:44.369 31138 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-6d269f17-c49c-4f64-93f8-139639020c5d - - - - -] get_subnet_for_dvr for subnet 3707b250-b6f5-4701-9b17-01a8f288c17a returned with {u'shared': True, u'service_types': [], u'description': None, u'enable_dhcp': True, u'tags': [], u'network_id': u'3f6ec232-7649-4639-b828-c3af9960481b', u'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'dns_nameservers': [u'10.1.10.19', u'8.8.8.8', u'8.8.4.4'], u'gateway_ip': u'172.16.0.1', u'ipv6_ra_mode': None, u'allocation_pools': [{u'start': u'172.16.0.2', u'end': u'172.16.255.254'}], u'host_routes': [], u'revision_number': 2, u'ipv6_address_mode': None, u'ip_version': 4, u'gateway_mac': u'fa:16:3e:84:0b:42', u'cidr': u'172.16.0.0/16', u'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'id': u'3707b250-b6f5-4701-9b17-01a8f288c17a', u'subnetpool_id': None, u'name': u'172.16.0.0/16'} _bind_centralized_snat_port_on_dvr_subnet /opt/pf9/pf9-neutron/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py:553
2018-07-24 19:44:51.786 31138 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent [req-6d269f17-c49c-4f64-93f8-139639020c5d - - - - -] get_subnet_for_dvr for subnet 98d2750d-60ce-4b53-88ef-423b77d5f5f5 returned with {u'shared': True, u'service_types': [], u'description': None, u'enable_dhcp': True, u'tags': [], u'network_id': u'655c3eb4-b9f5-4e30-92de-2262d6e87c92', u'tenant_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'dns_nameservers': [], u'gateway_ip': u'10.100.0.1', u'ipv6_ra_mode': None, u'allocation_pools': [{u'start': u'10.100.0.2', u'end': u'10.100.255.254'}], u'host_routes': [{u'destination': u'0.0.0.0/0', u'nexthop': u'172.16.0.1'}], u'revision_number': 0, u'ipv6_address_mode': None, u'ip_version': 4, u'gateway_mac': u'fa:16:3e:b1:bd:33', u'cidr': u'10.100.0.0/16', u'project_id': u'f175f441ebbb4c2b8fedf6469d6415fc', u'id': u'98d2750d-60ce-4b53-88ef-423b77d5f5f5', u'subnetpool_id': None, u'name': u'dogfood-vxlan-8000-sub'} _bind_centralized_snat_port_on_dvr_subnet /opt/pf9/pf9-neutron/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py:553
This causes a whole slew of problems - packets are sent into network
infrastructure with src MAC of the local DVR mac, causing this MAC to
flap on remote hosts' br-int between patch cable and qr interface. If
we shut the snat host's interfaces or bring the host down, the dvr MAC
stops flapping on br-int on other hosts, and network connectivity is
restored.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1783470/+subscriptions
References