← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1577488] [NEW] [RFE]"Fast exit" for compute node egress flows when using DVR

 

Public bug reported:

In its current state, distributed north-south flows with DVR can only be
acheived when a floating IP is bound to a fixed IP. Without a floating
IP associated, the north-south flows are steered through the centralized
SNAT node, even if you are directly routing the tenant network without
any SNAT. When DVR is combined with either BGP or IPv6 proxy neighbor
discovery, it becomes possible to route traffic directly to a fixed IP
by advertising the FIP gateway port on a compute as the next-hop.  For
packets egressing the compute node, we need the ability to bypass re-
direction of packets to the central SNAT node in cases where no floating
IP is associated with a fixed IP. By enabling this data flow on egress
from a compute node, it leaves the operator with the option of not
running any SNAT nodes. Distributed SNAT is not a consideration as the
targeted use cases involve scenarios where the operator does not want to
use any SNAT.

It is important to note that the use cases this would support are use
cases where the operator has no need for SNAT. In the scenarios that
would be supported by this RFE, the operator intends to run a routing
protocol or IPv6 proxy neighbor discovery to directly route the fixed
IP's of their tenants. It is also important to note that this RFE does
not specify what technology the operator would use for routing their
north-south DVR flows. The intent is simply to enable operators who have
the infrastructure in place to handle north-south flows in a distributed
fashion for their tenants.

To enable this functionality, we have the following options:

1. The semantics surrounding the "enable_snat" flag when set to "False"
on a distributed router could use some refinement. We could use this
flag to enable SNAT node bypass (fast-exit). This approach has the
benefit of cleaning up some semantics that seem loosley defined, and
allows us to piggyback on an existing attribute without extending the
model. The drawback is that this field is exposed to tenants who most
likely are not aware of how their network traffic is routed by the
provider network. Tenants probably don't need to be made aware that they
are "fast exit" treatment through the API, and it may not make sense to
place the burden on them to set this flag appropriately.

2. Add a new L3 agent mode called "dvr_fast_exit". When the L3 agent is
run in this mode, all router instances hosted on an L3 agent will send
egress traffic directly out through the FIP namespace and out to the
gateway, completely disabling SNAT support on all routers hosted on the
agent. This approach involves a simple change to skip programmming  the
"steal" rule that sends traffic to the SNAT node when run in this mode.
This is likely the least invasive change, but also has some drawbacks in
that upgrading to using this flag requires an agent restart and all
agents should be run in this mode. This approach would be well suited to
green-field deployments, but doesn't work well with brown-field
deployments.

3. There could be a third option I haven't considered yet. It could be
hashed out in a spec.

In addition to the work discussed above, we need to be able to
instantiate the FIP namespace and gateway port immediately when a router
gateway is created instead of waiting for the first floating IP
association on a node.

Related WIP patches
- https://review.openstack.org/#/c/297468/
- https://review.openstack.org/#/c/283757/

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: l3-dvr-backlog l3-ipam-dhcp rfe

** Tags added: l3-dvr-backlog l3-ipam-dhcp rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1577488

Title:
  [RFE]"Fast exit" for compute node egress flows when using DVR

Status in neutron:
  New

Bug description:
  In its current state, distributed north-south flows with DVR can only
  be acheived when a floating IP is bound to a fixed IP. Without a
  floating IP associated, the north-south flows are steered through the
  centralized SNAT node, even if you are directly routing the tenant
  network without any SNAT. When DVR is combined with either BGP or IPv6
  proxy neighbor discovery, it becomes possible to route traffic
  directly to a fixed IP by advertising the FIP gateway port on a
  compute as the next-hop.  For packets egressing the compute node, we
  need the ability to bypass re-direction of packets to the central SNAT
  node in cases where no floating IP is associated with a fixed IP. By
  enabling this data flow on egress from a compute node, it leaves the
  operator with the option of not running any SNAT nodes. Distributed
  SNAT is not a consideration as the targeted use cases involve
  scenarios where the operator does not want to use any SNAT.

  It is important to note that the use cases this would support are use
  cases where the operator has no need for SNAT. In the scenarios that
  would be supported by this RFE, the operator intends to run a routing
  protocol or IPv6 proxy neighbor discovery to directly route the fixed
  IP's of their tenants. It is also important to note that this RFE does
  not specify what technology the operator would use for routing their
  north-south DVR flows. The intent is simply to enable operators who
  have the infrastructure in place to handle north-south flows in a
  distributed fashion for their tenants.

  To enable this functionality, we have the following options:

  1. The semantics surrounding the "enable_snat" flag when set to
  "False" on a distributed router could use some refinement. We could
  use this flag to enable SNAT node bypass (fast-exit). This approach
  has the benefit of cleaning up some semantics that seem loosley
  defined, and allows us to piggyback on an existing attribute without
  extending the model. The drawback is that this field is exposed to
  tenants who most likely are not aware of how their network traffic is
  routed by the provider network. Tenants probably don't need to be made
  aware that they are "fast exit" treatment through the API, and it may
  not make sense to place the burden on them to set this flag
  appropriately.

  2. Add a new L3 agent mode called "dvr_fast_exit". When the L3 agent
  is run in this mode, all router instances hosted on an L3 agent will
  send egress traffic directly out through the FIP namespace and out to
  the gateway, completely disabling SNAT support on all routers hosted
  on the agent. This approach involves a simple change to skip
  programmming  the "steal" rule that sends traffic to the SNAT node
  when run in this mode. This is likely the least invasive change, but
  also has some drawbacks in that upgrading to using this flag requires
  an agent restart and all agents should be run in this mode. This
  approach would be well suited to green-field deployments, but doesn't
  work well with brown-field deployments.

  3. There could be a third option I haven't considered yet. It could be
  hashed out in a spec.

  In addition to the work discussed above, we need to be able to
  instantiate the FIP namespace and gateway port immediately when a
  router gateway is created instead of waiting for the first floating IP
  association on a node.

  Related WIP patches
  - https://review.openstack.org/#/c/297468/
  - https://review.openstack.org/#/c/283757/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1577488/+subscriptions


Follow ups