← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1998235] [NEW] Allowed address pairs and dvr routers

 

Public bug reported:

Hi,

I would like to report an issue with neutron port allowed address pairs
and DVR routers.

We are currently running Yoga and Ussuri environments.

In Yoga we noticed that if you add an allowed address pair to a neutron
port, the DVR router will receive a permanent ARP entry for the IP
configured in the allowed address pair. This seems to make sense. In
ussuri, the dvr router would not receive an ARP entry for an allowed
address pair so this looks like an improvement.

Where it gets more complicated is if you have two neutron ports with an allowed address pair with the same IP. An example use case would be when you have a VIP. 
What I have noticed is that the permanent ARP entry learned by the DVR router will be of the latest updated allowed address pair.
For example, if you add allowed address pair with IP X.X.X.X to neutron port 1, the DVR router will have a permanent ARP entry for IP X.X.X.X with the MAC address of neutron port 1.
Then, if you add the same IP X.X.X.X as an allowed address pair to neutron port 2, the DVR router will now have a permanent ARP entry for IP X.X.X.X with MAC address of neutron port 2.
In a way it makes sense since you cannot have two ARP entries for the same IP address but the problem that can occur is that the actual VIP could be on neuton port 1.

This problem becomes apparent with octavia loadbalancers in
active_standby topology. On LB creation, both the active and standby
instance are created at a very similar time so there is a 50% chance
that the LB does not work because the DVR router will have the permanent
ARP entry pointing to the backup instance instead of the active one for
the reason explained above.

But I think I discovered an even worse problem. Let's say we have the situation I described above. We have neutron port 1 and neutron port 2 and both have an allowed address pair with IP X.X.X.X. Currently, the permanent ARP entry on the DVR router for X.X.X.X is pointing to the MAC of neutron port 2.
If I delete neutron port 2 or remove the allowed address pair from neutron port 2, the permanent ARP entry is erased from the DVR router. And there is no permanent ARP entry for X.X.X.X pointing to neutron port 1. This means traffic won't reach the VIP X.X.X.X located on neutron port 1. This has another impact on an important use case.

Because of the issue with octavia active standby topology, I tried to resolve it using standalone topology. This means there is only one amphora instance. Octavia still uses an allowed address pair but now it only exists on one neutron port so the DVR router has the correct permanent ARP entry.
If I failover the LB for any reason, a new instance is created (new neutron port is created) and is assigned the allowed address pair. The DVR router correctly learns the new permanent ARP entry pointing to the new port. BUT then the broken instance is deleted which means the original neutron port is deleted AND that DELETES the permanent ARP entry on the DVR router even though it was no longer pointing to this port. At this point the LB no longer works because the DVR router does not know how to reach the LB's IP....

I find this very problematic. It means both octavia topologies do not work with DVR routers....
I think the permanent ARP entry logic needs to be revised. Maybe when deleting a permanent ARP entry for an allowed address pair IP address, neutron should double check if that same allowed address pair IP address exists on another neutron port and update the DVR router with this ARP entry.

The first issue I described I am unsure how to resolve. There are other
bugs related to it and discussions on how to handle vrrp, etc... dating
from a long time. But if we could resolve the second issue about the ARP
entry deletion, we could at least use octavia in standalone mode.

I used octavia as the main use cases but there are others and these
issues make them hard to handle. It would always require manual
intervention to fix. These issues are easy to recreate even without
octavia. Let me know if you have any questions.

Thanks

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1998235

Title:
  Allowed address pairs and dvr routers

Status in neutron:
  New

Bug description:
  Hi,

  I would like to report an issue with neutron port allowed address
  pairs and DVR routers.

  We are currently running Yoga and Ussuri environments.

  In Yoga we noticed that if you add an allowed address pair to a
  neutron port, the DVR router will receive a permanent ARP entry for
  the IP configured in the allowed address pair. This seems to make
  sense. In ussuri, the dvr router would not receive an ARP entry for an
  allowed address pair so this looks like an improvement.

  Where it gets more complicated is if you have two neutron ports with an allowed address pair with the same IP. An example use case would be when you have a VIP. 
  What I have noticed is that the permanent ARP entry learned by the DVR router will be of the latest updated allowed address pair.
  For example, if you add allowed address pair with IP X.X.X.X to neutron port 1, the DVR router will have a permanent ARP entry for IP X.X.X.X with the MAC address of neutron port 1.
  Then, if you add the same IP X.X.X.X as an allowed address pair to neutron port 2, the DVR router will now have a permanent ARP entry for IP X.X.X.X with MAC address of neutron port 2.
  In a way it makes sense since you cannot have two ARP entries for the same IP address but the problem that can occur is that the actual VIP could be on neuton port 1.

  This problem becomes apparent with octavia loadbalancers in
  active_standby topology. On LB creation, both the active and standby
  instance are created at a very similar time so there is a 50% chance
  that the LB does not work because the DVR router will have the
  permanent ARP entry pointing to the backup instance instead of the
  active one for the reason explained above.

  But I think I discovered an even worse problem. Let's say we have the situation I described above. We have neutron port 1 and neutron port 2 and both have an allowed address pair with IP X.X.X.X. Currently, the permanent ARP entry on the DVR router for X.X.X.X is pointing to the MAC of neutron port 2.
  If I delete neutron port 2 or remove the allowed address pair from neutron port 2, the permanent ARP entry is erased from the DVR router. And there is no permanent ARP entry for X.X.X.X pointing to neutron port 1. This means traffic won't reach the VIP X.X.X.X located on neutron port 1. This has another impact on an important use case.

  Because of the issue with octavia active standby topology, I tried to resolve it using standalone topology. This means there is only one amphora instance. Octavia still uses an allowed address pair but now it only exists on one neutron port so the DVR router has the correct permanent ARP entry.
  If I failover the LB for any reason, a new instance is created (new neutron port is created) and is assigned the allowed address pair. The DVR router correctly learns the new permanent ARP entry pointing to the new port. BUT then the broken instance is deleted which means the original neutron port is deleted AND that DELETES the permanent ARP entry on the DVR router even though it was no longer pointing to this port. At this point the LB no longer works because the DVR router does not know how to reach the LB's IP....

  I find this very problematic. It means both octavia topologies do not work with DVR routers....
  I think the permanent ARP entry logic needs to be revised. Maybe when deleting a permanent ARP entry for an allowed address pair IP address, neutron should double check if that same allowed address pair IP address exists on another neutron port and update the DVR router with this ARP entry.

  The first issue I described I am unsure how to resolve. There are
  other bugs related to it and discussions on how to handle vrrp, etc...
  dating from a long time. But if we could resolve the second issue
  about the ARP entry deletion, we could at least use octavia in
  standalone mode.

  I used octavia as the main use cases but there are others and these
  issues make them hard to handle. It would always require manual
  intervention to fix. These issues are easy to recreate even without
  octavia. Let me know if you have any questions.

  Thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1998235/+subscriptions