← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1774459] [NEW] RFE: Update permanent ARP entries for allowed_address_pair IPs in DVR Routers

 

Public bug reported:

We have a long term issue with Allowed_address_pairs IP which associated with unbound ports and DVR routers.
The ARP entry for the allowed_address_pair IP does not change based on the GARP issued by any keepalived instance.

Since DVR does the ARP table update through the control plane, and does
not allow any ARP entry to get out of the node to prevent the router
IP/MAC from polluting the network, there has been always an issue with
this.

A recent patch in master https://review.openstack.org/#/c/550676/ to
address this issue was not successful.

This patch helped in updating the ARP entry dynamically from the GARP
message. But the entry has to be Temporary(NUD - reachable). Only if it
is set to 'reachable' we were able to update it on the fly  from the
GARP message, without using any external tools.

But the problem here is, when we have VMs residing in two different
subnets (Subnet A and Subnet B) and if a VM from the Subnet B which is
on a different isolated node and is trying to ping the VRRP IP in the
Subnet A, the packet from the VM comes to the router namespace where the
ARP entry for the VRRP IP is available as reachable. While it is
reachable the VM is able to send couple of pings, and later within in 15
sec, the pings timeout.

The reason is that the Router is in turn trying to make sure that if the IP/MAC combination for the VRRP IP is still valid or not, since the entry in the ARP table is "REACHABLE" and not "PERMANENT". 
When it tries to re-ARP for the IP, the ARP entries are blocked by the DVR flow rules in the br-tun and so the ARP timesout and the ARP entry in the Router Namespace becomes incomplete.

Option A:
So the way to address this situation is to make use of some GARP sniffer tool/utility that would be running in the router namespace to sniff a GARP packet with a specific IP as a filter. If that IP is seen in the GARP message, the tool/utility should in-turn try to reset the ARP entry for the VRRP IP as permanent. ( This is one option ). This is very performance intensive and so not sure if it would be helpful. So we should probably make it configurable, so that people can use it if required.

Option B:
The other option is, instead of running it on all nodes and in all router-namespace, we can probably just run it on the network_node router_namespace, or in the network node host, and then send a message to the neutron that there was a change in IP/MAC somehow and then neutron will then communicate to all the hosts to do an ARP update for the given IP/MAC. ( Just an idea not sure how simple it is when compared to the former)


Any ideas or thoughts would be helpful.

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1774459

Title:
  RFE: Update permanent ARP entries for allowed_address_pair IPs in DVR
  Routers

Status in neutron:
  New

Bug description:
  We have a long term issue with Allowed_address_pairs IP which associated with unbound ports and DVR routers.
  The ARP entry for the allowed_address_pair IP does not change based on the GARP issued by any keepalived instance.

  Since DVR does the ARP table update through the control plane, and
  does not allow any ARP entry to get out of the node to prevent the
  router IP/MAC from polluting the network, there has been always an
  issue with this.

  A recent patch in master https://review.openstack.org/#/c/550676/ to
  address this issue was not successful.

  This patch helped in updating the ARP entry dynamically from the GARP
  message. But the entry has to be Temporary(NUD - reachable). Only if
  it is set to 'reachable' we were able to update it on the fly  from
  the GARP message, without using any external tools.

  But the problem here is, when we have VMs residing in two different
  subnets (Subnet A and Subnet B) and if a VM from the Subnet B which is
  on a different isolated node and is trying to ping the VRRP IP in the
  Subnet A, the packet from the VM comes to the router namespace where
  the ARP entry for the VRRP IP is available as reachable. While it is
  reachable the VM is able to send couple of pings, and later within in
  15 sec, the pings timeout.

  The reason is that the Router is in turn trying to make sure that if the IP/MAC combination for the VRRP IP is still valid or not, since the entry in the ARP table is "REACHABLE" and not "PERMANENT". 
  When it tries to re-ARP for the IP, the ARP entries are blocked by the DVR flow rules in the br-tun and so the ARP timesout and the ARP entry in the Router Namespace becomes incomplete.

  Option A:
  So the way to address this situation is to make use of some GARP sniffer tool/utility that would be running in the router namespace to sniff a GARP packet with a specific IP as a filter. If that IP is seen in the GARP message, the tool/utility should in-turn try to reset the ARP entry for the VRRP IP as permanent. ( This is one option ). This is very performance intensive and so not sure if it would be helpful. So we should probably make it configurable, so that people can use it if required.

  Option B:
  The other option is, instead of running it on all nodes and in all router-namespace, we can probably just run it on the network_node router_namespace, or in the network node host, and then send a message to the neutron that there was a change in IP/MAC somehow and then neutron will then communicate to all the hosts to do an ARP update for the given IP/MAC. ( Just an idea not sure how simple it is when compared to the former)

  
  Any ideas or thoughts would be helpful.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1774459/+subscriptions