← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1715734] Re: Gratuitous ARP for floating IPs not so gratuitous

 

** Also affects: openstack-ansible
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1715734

Title:
  Gratuitous ARP for floating IPs not so gratuitous

Status in neutron:
  In Progress
Status in openstack-ansible:
  New

Bug description:
  OpenStack Release: Newton
  OS: Ubuntu 16.04 LTS

  When working in an environment with multiple application deployments
  that build up/tear down routers and floating ips, it has been observed
  that connectivity to new instances using recycled floating IPs may be
  impacted.

  In this environment, the external provider network is connected to a
  Cisco Nexus 7010 with a default arp cache timeout of 1500 seconds. We
  have observed that the L3 agent is sending out the following arpings
  when floating IPs are assigned:

  2017-09-07 16:57:17.396 13048 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/openstack/venvs/neutron-r14.1.0/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-3429e5bc-b41b-46fb-9bef-6ec6ccf0d3b6', 'arping', '-A', '-I', 'qg-6582bfec-7d', '-c', '1', '-w', '1.5', '172.29.77.36'] create_process /openstack/venvs/neutron-r14.1.0/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
  2017-09-07 16:57:19.644 13048 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/openstack/venvs/neutron-r14.1.0/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-3429e5bc-b41b-46fb-9bef-6ec6ccf0d3b6', 'arping', '-U', '-I', 'qg-6582bfec-7d', '-c', '1', '-w', '1.5', '172.29.77.29'] create_process /openstack/venvs/neutron-r14.1.0/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
  2017-09-07 16:57:19.913 13048 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/openstack/venvs/neutron-r14.1.0/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-3429e5bc-b41b-46fb-9bef-6ec6ccf0d3b6', 'arping', '-U', '-I', 'qg-6582bfec-7d', '-c', '1', '-w', '1.5', '172.29.77.44'] create_process /openstack/venvs/neutron-r14.1.0/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89

  Here's the respective packet capture:

  18:09:06.366085 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.36 tell 172.29.77.39, length 28
  18:09:06.366085 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.29 tell 172.29.77.39, length 28
  18:09:06.366085 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.44 tell 172.29.77.39, length 28

  The source address in all of those ARP requests is 172.29.77.39 - the
  IP primary address on the qg interface. The ARP entry for the recycled
  floating IPs on the Nexus is not being refreshed and remains stale.
  For the gratuitous ARP to be successful, the source IP needs to be
  changed to the respective floating IP, so that both the source and
  destination IPs are the same. The following code change was made in
  ip_lib.py:

  FROM:
  arping_cmd = ['arping', arg, '-I', iface_name, '-c', 1,
                # Pass -w to set timeout to ensure exit if interface
                # removed while running
                '-w', 1.5, address]

  TO:
  arping_cmd = ['arping', arg, '-I', iface_name, '-c', 1,
                # Pass -w to set timeout to ensure exit if interface
                # removed while running
                '-w', 1.5, '-S', address, address]

  With that change in place, the following packet captures reflects the
  new behavior:

  18:10:30.389966 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.36 tell 172.29.77.36, length 28
  18:10:30.390068 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.29 tell 172.29.77.29, length 28
  18:10:30.390143 fa:16:3e:99:af:5c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 103, p 0, ethertype ARP, Request who-has 172.29.77.44 tell 172.29.77.44, length 28

  Since making the change, we have not had a failed deployment and all
  recycled floating IPs appear to be reachable immediately.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1715734/+subscriptions


References