yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1663232] Re: Openvswitch VLAN stripping issue with tunneling

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Maël Kimmerlin <mael.kimmerlin@xxxxxxxx>
Date: Fri, 10 Feb 2017 08:36:34 -0000
Reply-to: Bug 1663232 <1663232@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

This is indeed duplicate of
https://bugs.launchpad.net/neutron/+bug/1622017. Using Neutron from
Github stable/newton fixed the issue. I was using the Ubuntu package
that has this bug. Thank you for your explanations

** Changed in: neutron
Status: Incomplete => Invalid

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1663232

Title:
Openvswitch VLAN stripping issue with tunneling

Status in neutron:
Invalid

Bug description:
Openvswitch VLAN tag stripping and MTU issue

When L2 population is not used in an environment using Openvswitch as
ML2 or when the learned rules are matching, the vlan tag used
internally by Neutron is not stripped. Hence, for VXLAN the overhead
of the tunneling is higher than the MTU reduction on the virtual
networks because the VLAN tag is not stripped, thus causing MTU
issues.

In my setup, I have several OpenStack clouds (Newton) deployed using
Fuel, with VXLAN segmentation and using Openvswitch. It runs on Ubuntu
16.04. Some machines in the tenants virtual networks act as bridges
and thus L2 population is not sufficient, the learning feature of br-
tun is required. The deployments are the most basic that can be
performed with Fuel 10 (no additionnal services).

The overhead of VXLAN is 50 Bytes, if the original ethernet frame does
not have a VLAN tag. However, if the ethernet frame has a vlan tag,
the overhead is 54 Bytes. When setting up the virtual network MTU,
Neutron assumes that there is no vlan tag. However, Neutron uses
internally vlan tags to isolate the networks in br-int and br-tun.
When using L2 populations, the rules set in br-tun strip the vlan tag
before tunneling, hence everything work properly. But, when L2
population is not used or its rules not hit and the learning part
takes place, the learned rules do not strip the vlan, they only zero
it, hence the overhead is 54 Bytes and the communication is broken.

The following learning rule in br-tun installs flows that zero the
vlan tag and do not remove it.

in table 10:
#table=10,priority=0,actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xa9e495b5c54cf90c,OXM_OF_VLAN_VID[],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->OXM_OF_VLAN_VID[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:1

resulting in flows in table 20 like this :
#table=20,priority=1,vlan_tci=0x0003/0x0fff,dl_dst=fa:16:3e:b6:53:e2 actions=load:0->OXM_OF_VLAN_VID[],load:0x18->NXM_NX_TUN_ID[],output:2
This flow does not remove the vlan tag. When using L2 population, some flows with higher priority are inserted, that do strip the vlan tag correctly. However, the learned flows are used if the L2 populations flow do not match.

Expected output : traffic without vlan tag tunneled in VXLAN with a 50 Bytes overhead
Actual output : traffic with a vlan tag (0) tunneled in VXLAN with a 54 Bytes overhead

The issue does not happen for GRE as the 4 additionnal bytes are still
fitting in the 50 Bytes MTU reduction on the tenant network

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1663232/+subscriptions

References

[Bug 1663232] [NEW] Openvswitch VLAN stripping issue with tunneling
From: Maël Kimmerlin, 2017-02-09