yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #67307
[Bug 1715734] [NEW] Gratuitous ARP for floating IPs not so gratuitous
Public bug reported:
OpenStack Release: Newton
OS: Ubuntu 16.04 LTS
When working in an environment with multiple application deployments
that build up/tear down routers and floating ips, it has been observed
that connectivity to new instances using recycled floating IPs may be
impacted.
In this environment, the external provider network is connected to a
Cisco Nexus 7010 with a default arp cache timeout of 1500 seconds. We
have observed that the L3 agent is sending out the following arpings
when floating IPs are assigned:
2017-09-07 16:57:17.396 13048 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/openstack/venvs/neutron-r14.1.0/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-3429e5bc-b41b-46fb-9bef-6ec6ccf0d3b6', 'arping', '-A', '-I', 'qg-6582bfec-7d', '-c', '1', '-w', '1.5', '172.29.77.36'] create_process /openstack/venvs/neutron-r14.1.0/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2017-09-07 16:57:19.644 13048 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/openstack/venvs/neutron-r14.1.0/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-3429e5bc-b41b-46fb-9bef-6ec6ccf0d3b6', 'arping', '-U', '-I', 'qg-6582bfec-7d', '-c', '1', '-w', '1.5', '172.29.77.29'] create_process /openstack/venvs/neutron-r14.1.0/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2017-09-07 16:57:19.913 13048 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/openstack/venvs/neutron-r14.1.0/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-3429e5bc-b41b-46fb-9bef-6ec6ccf0d3b6', 'arping', '-U', '-I', 'qg-6582bfec-7d', '-c', '1', '-w', '1.5', '172.29.77.44'] create_process /openstack/venvs/neutron-r14.1.0/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
Here's the respective packet capture:
18:09:06.366085 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.36 tell 172.29.77.39, length 28
18:09:06.366085 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.29 tell 172.29.77.39, length 28
18:09:06.366085 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.44 tell 172.29.77.39, length 28
The source address in all of those ARP requests is 172.29.77.39 - the IP
primary address on the qg interface. The ARP entry for the recycled
floating IPs on the Nexus is not being refreshed and remains stale. For
the gratuitous ARP to be successful, the source IP needs to be changed
to the respective floating IP, so that both the source and destination
IPs are the same. The following code change was made in ip_lib.py:
FROM:
arping_cmd = ['arping', arg, '-I', iface_name, '-c', 1,
# Pass -w to set timeout to ensure exit if interface
# removed while running
'-w', 1.5, address]
TO:
arping_cmd = ['arping', arg, '-I', iface_name, '-c', 1,
# Pass -w to set timeout to ensure exit if interface
# removed while running
'-w', 1.5, '-S', address, address]
With that change in place, the following packet captures reflects the
new behavior:
18:10:30.389966 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.36 tell 172.29.77.36, length 28
18:10:30.390068 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.29 tell 172.29.77.29, length 28
18:10:30.390143 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.44 tell 172.29.77.44, length 28
Since making the change, we have not had a failed deployment and all
recycled floating IPs appear to be reachable immediately.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1715734
Title:
Gratuitous ARP for floating IPs not so gratuitous
Status in neutron:
New
Bug description:
OpenStack Release: Newton
OS: Ubuntu 16.04 LTS
When working in an environment with multiple application deployments
that build up/tear down routers and floating ips, it has been observed
that connectivity to new instances using recycled floating IPs may be
impacted.
In this environment, the external provider network is connected to a
Cisco Nexus 7010 with a default arp cache timeout of 1500 seconds. We
have observed that the L3 agent is sending out the following arpings
when floating IPs are assigned:
2017-09-07 16:57:17.396 13048 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/openstack/venvs/neutron-r14.1.0/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-3429e5bc-b41b-46fb-9bef-6ec6ccf0d3b6', 'arping', '-A', '-I', 'qg-6582bfec-7d', '-c', '1', '-w', '1.5', '172.29.77.36'] create_process /openstack/venvs/neutron-r14.1.0/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2017-09-07 16:57:19.644 13048 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/openstack/venvs/neutron-r14.1.0/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-3429e5bc-b41b-46fb-9bef-6ec6ccf0d3b6', 'arping', '-U', '-I', 'qg-6582bfec-7d', '-c', '1', '-w', '1.5', '172.29.77.29'] create_process /openstack/venvs/neutron-r14.1.0/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2017-09-07 16:57:19.913 13048 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/openstack/venvs/neutron-r14.1.0/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-3429e5bc-b41b-46fb-9bef-6ec6ccf0d3b6', 'arping', '-U', '-I', 'qg-6582bfec-7d', '-c', '1', '-w', '1.5', '172.29.77.44'] create_process /openstack/venvs/neutron-r14.1.0/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
Here's the respective packet capture:
18:09:06.366085 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.36 tell 172.29.77.39, length 28
18:09:06.366085 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.29 tell 172.29.77.39, length 28
18:09:06.366085 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.44 tell 172.29.77.39, length 28
The source address in all of those ARP requests is 172.29.77.39 - the
IP primary address on the qg interface. The ARP entry for the recycled
floating IPs on the Nexus is not being refreshed and remains stale.
For the gratuitous ARP to be successful, the source IP needs to be
changed to the respective floating IP, so that both the source and
destination IPs are the same. The following code change was made in
ip_lib.py:
FROM:
arping_cmd = ['arping', arg, '-I', iface_name, '-c', 1,
# Pass -w to set timeout to ensure exit if interface
# removed while running
'-w', 1.5, address]
TO:
arping_cmd = ['arping', arg, '-I', iface_name, '-c', 1,
# Pass -w to set timeout to ensure exit if interface
# removed while running
'-w', 1.5, '-S', address, address]
With that change in place, the following packet captures reflects the
new behavior:
18:10:30.389966 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.36 tell 172.29.77.36, length 28
18:10:30.390068 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.29 tell 172.29.77.29, length 28
18:10:30.390143 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.44 tell 172.29.77.44, length 28
Since making the change, we have not had a failed deployment and all
recycled floating IPs appear to be reachable immediately.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1715734/+subscriptions
Follow ups