← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1663232] [NEW] Openvswitch VLAN stripping issue with tunneling

 

Public bug reported:

Openvswitch VLAN tag stripping and MTU issue

When L2 population is not used in an environment using Openvswitch as
ML2 or when the learned rules are matching, the vlan tag used internally
by Neutron is not stripped. Hence, for VXLAN the overhead of the
tunneling is higher than the MTU reduction on the virtual networks
because the VLAN tag is not stripped, thus causing MTU issues.

In my setup, I have several OpenStack clouds (Newton) deployed using
Fuel, with VXLAN segmentation and using Openvswitch. It runs on Ubuntu
16.04. Some machines in the tenants virtual networks act as bridges and
thus L2 population is not sufficient, the learning feature of br-tun is
required. The deployments are the most basic that can be performed with
Fuel 10 (no additionnal services).

The overhead of VXLAN is 50 Bytes, if the original ethernet frame does
not have a VLAN tag. However, if the ethernet frame has a vlan tag, the
overhead is 54 Bytes. When setting up the virtual network MTU, Neutron
assumes that there is no vlan tag. However, Neutron uses internally vlan
tags to isolate the networks in br-int and br-tun. When using L2
populations, the rules set in br-tun strip the vlan tag before
tunneling, hence everything work properly. But, when L2 population is
not used or its rules not hit and the learning part takes place, the
learned rules do not strip the vlan, they only zero it, hence the
overhead is 54 Bytes and the communication is broken.

The following learning rule in br-tun installs flows that zero the vlan
tag and do not remove it.

in table 10:
#table=10,priority=0,actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xa9e495b5c54cf90c,OXM_OF_VLAN_VID[],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->OXM_OF_VLAN_VID[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:1

resulting in flows in table 20 like this :
#table=20,priority=1,vlan_tci=0x0003/0x0fff,dl_dst=fa:16:3e:b6:53:e2 actions=load:0->OXM_OF_VLAN_VID[],load:0x18->NXM_NX_TUN_ID[],output:2
This flow does not remove the vlan tag. When using L2 population, some flows with higher priority are inserted, that do strip the vlan tag correctly. However, the learned flows are used if the L2 populations flow do not match.

Expected output : traffic without vlan tag tunneled in VXLAN with a 50 Bytes overhead
Actual output : traffic with a vlan tag (0) tunneled in VXLAN with a 54 Bytes overhead

The issue does not happen for GRE as the 4 additionnal bytes are still
fitting in the 50 Bytes MTU reduction on the tenant network

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: mtu openvswitch tunnel vlan

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1663232

Title:
  Openvswitch VLAN stripping issue with tunneling

Status in neutron:
  New

Bug description:
  Openvswitch VLAN tag stripping and MTU issue

  When L2 population is not used in an environment using Openvswitch as
  ML2 or when the learned rules are matching, the vlan tag used
  internally by Neutron is not stripped. Hence, for VXLAN the overhead
  of the tunneling is higher than the MTU reduction on the virtual
  networks because the VLAN tag is not stripped, thus causing MTU
  issues.

  In my setup, I have several OpenStack clouds (Newton) deployed using
  Fuel, with VXLAN segmentation and using Openvswitch. It runs on Ubuntu
  16.04. Some machines in the tenants virtual networks act as bridges
  and thus L2 population is not sufficient, the learning feature of br-
  tun is required. The deployments are the most basic that can be
  performed with Fuel 10 (no additionnal services).

  The overhead of VXLAN is 50 Bytes, if the original ethernet frame does
  not have a VLAN tag. However, if the ethernet frame has a vlan tag,
  the overhead is 54 Bytes. When setting up the virtual network MTU,
  Neutron assumes that there is no vlan tag. However, Neutron uses
  internally vlan tags to isolate the networks in br-int and br-tun.
  When using L2 populations, the rules set in br-tun strip the vlan tag
  before tunneling, hence everything work properly. But, when L2
  population is not used or its rules not hit and the learning part
  takes place, the learned rules do not strip the vlan, they only zero
  it, hence the overhead is 54 Bytes and the communication is broken.

  The following learning rule in br-tun installs flows that zero the
  vlan tag and do not remove it.

  in table 10:
  #table=10,priority=0,actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xa9e495b5c54cf90c,OXM_OF_VLAN_VID[],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->OXM_OF_VLAN_VID[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:1

  resulting in flows in table 20 like this :
  #table=20,priority=1,vlan_tci=0x0003/0x0fff,dl_dst=fa:16:3e:b6:53:e2 actions=load:0->OXM_OF_VLAN_VID[],load:0x18->NXM_NX_TUN_ID[],output:2
  This flow does not remove the vlan tag. When using L2 population, some flows with higher priority are inserted, that do strip the vlan tag correctly. However, the learned flows are used if the L2 populations flow do not match.

  Expected output : traffic without vlan tag tunneled in VXLAN with a 50 Bytes overhead
  Actual output : traffic with a vlan tag (0) tunneled in VXLAN with a 54 Bytes overhead

  The issue does not happen for GRE as the 4 additionnal bytes are still
  fitting in the 50 Bytes MTU reduction on the tenant network

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1663232/+subscriptions


Follow ups