← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1583694] [NEW] [RFE] DVR support for Allowed_address_pair port that are bound to multiple ACTIVE VM ports used by Octavia

 

Public bug reported:

DVR support for Allowed_address_pair ports with FloatingIP that are
unbound and assgined to Multiple VMs that are active.

Problem Statement:

When FloatingIP is asssigned to Allowed_address_pair port and assigned to multiple VMs that are ACTIVE and connected to DVR (Distributed Virtual Router) routers, the FloatingIP is not functional. 
The use case here is to provide redundancy to the VMs that are serviced by the DVR routers.
This feature works good for Legacy Routers ( Centralized Routers).

Theory:
Distributed Virtual Routers were designed for scalability and performance and to reduce the load on the single network node.

Distributed Virtual Routers are created on each Compute node dynamically
on demand and removed when not required. Distributed Virtual Routers
heavily depend on the port binding to identify the requirement of a DVR
service on a particular node.

Today we only create/update/delete floatingip based on the router and
the host in which the floatingip service is required. So the 'host' part
is very critical for the operation of the DVR.

In the above mentioned use case, we are dealing with
Allowed_address_pair port, which is unbound to any specific host and are
also assigned to multiple VMs that are ACTIVE at the same time.

We have a work around today to inherit the parent VMs port binding
properties for the allowed_address_pair port if the parent VMs port is
ACTIVE. This has a limitation, that we assume that there would be only
one "ACTIVE" VM port with the allowed_address_pair port for this to
work.

The reason for this is, if we have multiple "ACTIVE" VM port associated
with the same allowed_address_pair port, and if the allowed_address_pair
port has a FloatingIP associated with it, we can't provide the
FloatingIP service on all the nodes were the VM's port is ACTIVE. This
would create an issue because we will be seeing the same FloatingIP
being advertised(GARP) from all nodes, and so the users on the external
network will get confused on where the actual "ACTIVE" port is.

Why is it working with Legacy Routers:

In the case of legacy routers, the routers are always located a the
network node and the DNAT is also done at the router_namespace in the
Network node. They don't depend on the host-binding, since all the
traffic have to flow through the centralized router in the network node.
Also in the case of centralized routers, there is not issue of
Floatingip GARP, since it is always going to be coming in through a
single node.

So in the background, the allowed_address_pair port MAC is being
dynamically switched from one VM to another VM by the keepalived that
runs in the VM. So neutron does not need to know about any of those and
it works as expected.


Why it is not working with DVR Routers:
1. Allowed_address_pair does not have host-binding.
2. If we were to inherit from the VMs host-binding, there are multiple VMs that are ACTIVE, so we can't have a single host-binding for these allowed_address_pair ports.
3. Even if we ignore the port_binding on the allowed_address_pair port and try to start providing the plumbing for the FloatingIP on multiple nodes based on the VMs it is assoicated with, there are issues with the same FloatingIP being GARP from different compute nodes that would confuse.

How we can make it to work with DVR:

Option 1:
Neutron should have a some visibility on the state of the VM port, when the switch between ACTIVE and STANDBY happens. Today it is done by the keepalived on the VM and so it is not being logged anywhere.
If the keepalived can log the event in neutron port, then it can be used by the neutron to determine when to allow FloatingIP traffic and block FloatingIP traffic for a particular node, and then send the GARP from the respective node. There is some delay introduced in this as well.

(Desired) Low-hanging fruit.

Option 2:

Option 2 basically negates the Distributed nature of DVR and makes it centralized for North-South.
The other option is to have the FloatingIP functionality centralized for such features. But this would be more complex, since we need to introduce config options for agents and floatingip. Also in this case, we can't have both the local floatingip and centralized floatingip support for the same node. A compute node can only have either localized floatingip or centralized floatingip.

Complex ( Negates the purpose of DVR)

References:
Some references to the patches that we have already to support a single use case for the Allowed_address_pair with FloatingIP in DVR.

https://review.openstack.org/#/c/254439/
https://review.openstack.org/#/c/301410/
https://review.openstack.org/#/c/304905/

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: l3-dvr-backlog lbaas neutron

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1583694

Title:
  [RFE] DVR support for Allowed_address_pair port that are bound to
  multiple ACTIVE VM ports used by Octavia

Status in neutron:
  New

Bug description:
  DVR support for Allowed_address_pair ports with FloatingIP that are
  unbound and assgined to Multiple VMs that are active.

  Problem Statement:

  When FloatingIP is asssigned to Allowed_address_pair port and assigned to multiple VMs that are ACTIVE and connected to DVR (Distributed Virtual Router) routers, the FloatingIP is not functional. 
  The use case here is to provide redundancy to the VMs that are serviced by the DVR routers.
  This feature works good for Legacy Routers ( Centralized Routers).

  Theory:
  Distributed Virtual Routers were designed for scalability and performance and to reduce the load on the single network node.

  Distributed Virtual Routers are created on each Compute node
  dynamically on demand and removed when not required. Distributed
  Virtual Routers heavily depend on the port binding to identify the
  requirement of a DVR service on a particular node.

  Today we only create/update/delete floatingip based on the router and
  the host in which the floatingip service is required. So the 'host'
  part is very critical for the operation of the DVR.

  In the above mentioned use case, we are dealing with
  Allowed_address_pair port, which is unbound to any specific host and
  are also assigned to multiple VMs that are ACTIVE at the same time.

  We have a work around today to inherit the parent VMs port binding
  properties for the allowed_address_pair port if the parent VMs port is
  ACTIVE. This has a limitation, that we assume that there would be only
  one "ACTIVE" VM port with the allowed_address_pair port for this to
  work.

  The reason for this is, if we have multiple "ACTIVE" VM port
  associated with the same allowed_address_pair port, and if the
  allowed_address_pair port has a FloatingIP associated with it, we
  can't provide the FloatingIP service on all the nodes were the VM's
  port is ACTIVE. This would create an issue because we will be seeing
  the same FloatingIP being advertised(GARP) from all nodes, and so the
  users on the external network will get confused on where the actual
  "ACTIVE" port is.

  Why is it working with Legacy Routers:

  In the case of legacy routers, the routers are always located a the
  network node and the DNAT is also done at the router_namespace in the
  Network node. They don't depend on the host-binding, since all the
  traffic have to flow through the centralized router in the network
  node. Also in the case of centralized routers, there is not issue of
  Floatingip GARP, since it is always going to be coming in through a
  single node.

  So in the background, the allowed_address_pair port MAC is being
  dynamically switched from one VM to another VM by the keepalived that
  runs in the VM. So neutron does not need to know about any of those
  and it works as expected.

  
  Why it is not working with DVR Routers:
  1. Allowed_address_pair does not have host-binding.
  2. If we were to inherit from the VMs host-binding, there are multiple VMs that are ACTIVE, so we can't have a single host-binding for these allowed_address_pair ports.
  3. Even if we ignore the port_binding on the allowed_address_pair port and try to start providing the plumbing for the FloatingIP on multiple nodes based on the VMs it is assoicated with, there are issues with the same FloatingIP being GARP from different compute nodes that would confuse.

  How we can make it to work with DVR:

  Option 1:
  Neutron should have a some visibility on the state of the VM port, when the switch between ACTIVE and STANDBY happens. Today it is done by the keepalived on the VM and so it is not being logged anywhere.
  If the keepalived can log the event in neutron port, then it can be used by the neutron to determine when to allow FloatingIP traffic and block FloatingIP traffic for a particular node, and then send the GARP from the respective node. There is some delay introduced in this as well.

  (Desired) Low-hanging fruit.

  Option 2:

  Option 2 basically negates the Distributed nature of DVR and makes it centralized for North-South.
  The other option is to have the FloatingIP functionality centralized for such features. But this would be more complex, since we need to introduce config options for agents and floatingip. Also in this case, we can't have both the local floatingip and centralized floatingip support for the same node. A compute node can only have either localized floatingip or centralized floatingip.

  Complex ( Negates the purpose of DVR)

  References:
  Some references to the patches that we have already to support a single use case for the Allowed_address_pair with FloatingIP in DVR.

  https://review.openstack.org/#/c/254439/
  https://review.openstack.org/#/c/301410/
  https://review.openstack.org/#/c/304905/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1583694/+subscriptions


Follow ups