yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #41784
[Bug 1509184] Re: Enable openflow based dvr routing for east/west traffic
Unless I am missing something, this can be handled by working in
dragonflow.
** Changed in: neutron
Status: New => Won't Fix
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1509184
Title:
Enable openflow based dvr routing for east/west traffic
Status in neutron:
Won't Fix
Bug description:
In the juno cycle dvr support was added to neutron do decentralise routing to the compute nodes.
This RFE bug proposes the introduction of a new dvr mode (dvr_local_openflow) to optimise the datapath
of east/west traffic.
-----------------------------------------------High level description-------------------------------
The current implementation of DVR with ovs utilizes linux network namespaces to instantiate l3
routers, the details of which are described here: http://docs.openstack.org/networking-guide/scenario_dvr_ovs.html
fundamentally a neutron router comprises of 3 elements.
- a router instance (network namespace)
- a router interface (tap device)
- a set or routing rules (kernel ip routes)
In the special case of routing east/west traffic both the source and destination interfaces are known to neutron.
because of that fact neutron contains all the information required to logically route traffic from its origin to its destination
enabling the path to be established primitively. this proposal suggests moving the instantiation of the dvr local router from the kernel ip stack to Open vSwitch(ovs) for east/west traffic.
Open vSwitch provides a logical programmable interface (Openflow) to configure traffic forwarding and modification actions on arbitrary packet streams. When managed by the neutron openvswich l2 agent, ovs operates as a simple mac learning switch with limited utilisation of it programmable dataplane. to utilise ovs to create an l3 router the follow mappings from the 3 fundamental elements can be made
- a router instance (network namespace + a ovs bridge)
- a router interface (tap device + patch port pair)
- a set or routing rules (kernel ip routes + openflow rules)
----------------------------------------background context---------------------------------------------
TL;DR
basic explanation of openflow/ovs briges and patch ports
skip to implementation section if familiar.
ovs implementation background:
In openvswich at the control layer an ovs bridge is a unique logical domain of interfaces and flow rules.
Similarly at the control layer a patch port pair is a logical entity that interconnects two bridges(or logical domains).
From a dataplane perspective each ovs bridge is first created as a separate instance of a dataplane.
if these separate bridges/dataplanes are interconnected by patch ports, ovs will collapse the independent dataplanes into a single
ovs dataplane instance. As a direct result of this implementation a logical topology of 1 bridge with two interfaces is realised in the dataplane level identically to 2 bridges each with 1 interface interconnected by path ports. This translate to zero dataplane overhead to the creation of multiple bridge allowing for arbitrary numbers of router instances to be created.
Openflow capability background:
The openflow protocol provides many capabilities which can be generally summarised as packet match criteria and actions to apply
when the criteria is satisfied. In the case of l3 routeing the match criteria of relevance are the Ethernet type and the destination ip address.similarly the openflow actions required are mod_dest,set_field,move,dec_ttl,output and drop.
logical packet flow for a ping between two vms on same host:
in the l2 case if a vm tries to ping another vm in the same subnet thre are 4 stages.
- first it will send a broadcast arp packet to learn the mac address from the destination ip of the remote vm.
- second the destination vm receives the arp request and learns the source vms mac,then replies as follows:
a.) swap the source and destination ip of the arp packet
b.) copy the source mac address to the destination mac address and set the source mac address to the local interface mac.
c.) set arp type code form request to reply.
d.) transmit reply via received interface
- third on receiving the arp reply the source vm will transmit the icmp packet
source vm will then transmit the icmp packet to the destination vm with the learned mac address
- fourth on receiving the icmp the destination vm replies.
in the l3 case the packet flow is similar but slightly different.
- first the source vm sends an arp to the subnet gateway.
- second the gateway router responds with its mac address
- third the source vm send the icmp packet to the router
- fourth the router receives the icmp packet and send an arp to the destination vm.
- fifth the destination vm sends a arp reply to the gateway
- sixth the router forwards the icmp to the destination vm
-seventh the destination vm replies to the router
- eight the reply is received by the source vm.
----------------------------------current
implementation---------------------------------------------------
l3 ping packet flow in dvr_local mode(simplified to ignore broadcast):
logical:
- the arp packet is received from the source vm and logically vlan tagged(tenant isolation)
- the arp packet is output to the router tap device(tap1), the vlan is striped and the packet is copied from the ovs dataplane to the
kernel networking stack in the routers linux namespace.
- the kernel network stack replies to the arp and the reply packet is copied to the ovs dataplane and it is logically vlan tagged
- the vlan is logically striped and the arp reply switched to the source vm interface.
- the icmp packet is received from the source vm and logically vlan tagged(tenant isolation)
- the icmp packet is output to the route tap device, the vlan is striped and the packet is copied from the ovs dataplane to the
kernel networking stack in the routers linux namespace.
- the kernel generates an arp request to the destination vm which follows the same path as the arp described above
- the kernel modifies the dest mac address, decrements the ttl and routes the packet to the appropriate tap device(tap2) where the packet is copied to the ovs dataplane and it is logically vlan tagged
- the vlan is logically striped and the icmp packet switched to the destination vm interface.
- the reply path is similarly and is shortened as follows:
destvm->vlan tagged->vlan stripped -> copied to kernel name space via tap2-> copied to ovs dataplane via tap1-> vlan tagged-> vlan stripped-> received by source vm.
actual:
- arp form source vm -> tap1 (vlan tagging skipped) + broadcast to other ports
- tap1-> kernel network stack
- kernel sends arp reply tap1
- tap1-> source vm (vlan tagging skipped)
- icmp from source vm -> tap1(vlan tagging skipped)
- kernel receives icmp on tap1 and send arp request to dest vm via tap2(broadcast)
- arp via tap2 -> dest vm (vlan tagging skipped)
- dest vm replies -> tap2
- kernel updates dest mac and decrement ttl the forward icmp packet to tap2
- tap2 -> dest vm-> dest vm replies->tap2.(vlan tagging skipped)
- kernel updates dest mac and decrement ttl the forward icmp reply packet to tap1
- tap1-> source vm
-------------------------------------proposed change----------------------------------------------------------
Proposed change:
- a new class will be added to implement the new mode that subclasses the existing
dvr_local router class.
- if mode is dvr_local_openflow a routing bridge will be created for each dvr router.
- when an internal network is added to the router the following actions will be preformed:
a.) the tap interface will be created in the router network namespaces as normal but added
to routing bridge instead of the br-int.(tap devices are only used for north/south traffic)
b.) a patch port pair will be created between the br-int and routing bridge
c.) the attached-mac,iface-id and iface-status will be populated in the external-id field or the br-int side of the patch port.
this will enabled the unmodified neutron l2 agent to correctly manage the patch port.
d.) a low priority rule that send all traffic form the patch port to the tap device will be added to the routing bridge.
e.) a medium priority rule that will reply to all arp request to the router will be added to the routing bridge.
this rule will use openflows move and set field actions to rewrite the arp request into a reply and output=in_port.
f.) a high priority dest mac update and ttl decrement rule will be added to the routing bridge for each port
on the internal network.
- when an external network is added to the router the workflow will be unchanged and is inherited from the dvr_local
implementation.
- the _update_arp_entry function will be extended additional populate and delete the high priority dest mac update rules
as neutron ports are added/removed form connected networks.
l3 packet flow in dvr_local_openflow mode:
logical:
- the arp packet is received from the source vm and logically vlan tagged(tenant isolation)
- the arp packet is output to the router bridge patch port , the vlan is striped
- the arp request is rewritten into a reply and sent back to the br-int and logically vlan tagged
- the vlan is logically striped and the arp reply switched to the source vm interface.
- the icmp packet is received from the source vm and logically vlan tagged(tenant isolation)
- the icmp packet is output to the router bridge patch port , the vlan is striped.
- the icmp packet matches the high priority rule and its destination mac is updated the it is output to the second patch port and it is logically vlan tagged
- the vlan is logically striped and the icmp packet switched to the destination vm interface.
- the reply path is similarly and is shortened as follows:
destvm->vlan tagged->vlan stripped -> router bridge via patch 2-> dest mac and ttl updated then output patch 1-> vlan tagged-> vlan stripped-> received by source vm.
actual:
- arp form source vm -> arp rewritten to reply -> sent to source vm ( single openflow action).
- icmp from source vm -> destination mac update, ttl decremented -> dest vm ( single openflow action)
- icmp from dest vm -> destination mac update, ttl decremented -> source vm ( single openflow action)
other considerations:
- north/south
as ovs cannot lookup the destination mac dynamically via arp it is not possible to optimise the
north/south path as described above.
- openvswich support
this mechanism is compatible with both kernel and dpdk ovs.
this mechanism requires nicira extensions for arp rewrite.
arp rewrite can be skipped for great support if required as it will fall back to tap device and kernel.
icmp traffic for router interface will be handled by tap device as ovs currently does not
support setting icmp type code via set_field or load openflow actions.
- performance
performance of l3 routing is expected to approach l2 performance for east/west traffic.
performance is not expected to change for north/south.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1509184/+subscriptions
References