← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1542475] [NEW] MTU concerns for the Open vSwitch agent

 

Public bug reported:

I ran some experiments with the Open vSwitch (OVS) agent [1] to
determine the source of MTU problems and offer a potential solution. The
environment for these experiments contains the following items:

1) A physical (underlying) network supporting MTU of 1500 or 9000 bytes.
2) One controller node running the neutron server, OVS agent, L3 agent, DHCP agent, metadata agent, and OVS provider network bridge br-ex.
3) One compute node running the Open vSwitch agent.
4) A neutron provider/public network.
5) A neutron self-service/private network.
6) A neutron router between the provider and self-service networks.
7) The self-service network uses the VXLAN protocol with IPv4 endpoints which adds 50 bytes of overhead.
8) An instance on the self-service network with a floating IP address from an allocation pool on the provider network.

Background:

1. Interfaces (or ports) on OVS bridges such as those for overlay
network tunnels appear to use an arbitrarily large MTU. Thus, OVS
bridges and tunnel interfaces somewhat inherit the MTU of physical
network interfaces. For example, if OVS uses the IP address of eth0 for
a tunnel overlay network endpoint and eth0 has a 1500 MTU, the tunnel
interface can only send packets with a payload of up to 1500 bytes
including overlay protocol overhead.

2. OVS creates interfaces (ports) in the host namespace and moves them
to the appropriate namespace(s) rather than creating veth pairs between
namespaces.

3. For Linux bridge devices such as those on the compute node that
implement security groups, Linux assumes a 1500 MTU and changes the MTU
to the lowest MTU of any port on the bridge. For example, a bridge
without ports has a 1500 MTU. If eth0 has a 9000 MTU and you add it as a
port on the bridge, the bridge changes to a 9000 MTU. If eth1 has a 1500
MTU and you add it as a port on the bridge, the bridge changes to a 1500
MTU.

4. Only devices that operate at layer-3 can participate in path MTU
discovery (PMTUD). Therefore, a change of MTU in a layer-2 device such
as a bridge or veth pair causes that device to discard packets larger
than the smallest MTU.

Observations:

1. For any physical network MTU, the port for the self-service network
router interface (qr) in the router namespace (qrouter) has a 1500 MTU.
Background item (2) prevents a MTU disparity at layer-2 between the
router namespace and OVS bridge br-int. If a packet from the provider
network to the instance has a payload larger than 1500 bytes, the router
can send an ICMP message to the source telling it to use a 1500 MTU.
However, the correct MTU for a private network using the VXLAN overlay
protocol should account for 50 bytes of overhead. Thus, OVS fragments
the packet over the tunnel and reassembles it on the compute node
containing the instance.

2. For a physical network MTU larger than 1500, the port for the
provider network router gateway (qg) in the router namespace (qrouter)
has a 1500 MTU. Background item (2) prevents a MTU disparity between the
router namespace and OVS provider network bridge br-ex. If a packet from
the provider network to the instance has a payload larger than 1500
bytes, the router can send an ICMP messages to the source telling it to
use a 1500 MTU regardless of the private network overlay protocol. Thus,
the agent cannot realize a physical network MTU larger than 1500.

3. If a provider or private network uses DHCP, the port in the DHCP
namespace has a 1500 MTU for any physical network MTU.

4. The Linux bridge that implements security groups on the compute node
lacks any ports on physical network interfaces. Background item (3)
causes the bridge to assume a 1500 MTU. Nova actually manages this
bridge and creates a veth pair between it and the Open vSwitch bridge
br-int. Both ends of the veth pair have a 1500 MTU. Background item (1)
indicates that the OVS bridge br-int could have a larger MTU. Thus, OVS
discards packets inbound to instances with a payload larger than 1500
bytes.

5. Instances must use a MTU value the accounts for overlay protocol
overhead. Neutron currently offers a way to provide a correct value via
DHCP. However, considering observation item (4), providing a MTU value
larger than 1500 causes a disparity at layer-2 between the VM and tap
interface port on the Linux bridge that implements security groups on
the compute node. Thus, the bridge discards packets outbound from
instances with a payload larger than 1500 bytes.

6. The nova 'network_device_mtu' option controls the MTU of all devices
that it manages in observation items (4) and (5). For example, using a
value of 9000 causes the bridge, veth pair, and tap to have a 9000 MTU.
Combining this option with providing the correct value to instances via
DHCP essentially resolves MTU problems on compute nodes.

Potential solution:

1. The port for the self-service network router interface (qr) in the
router namespace (qrouter) must use the MTU of the physical network
accounting for any overlay protocol overhead. For example, if the
physical network has a 9000 MTU and the private network uses the VXLAN
overlay protocol, the port must have a 8950 MTU.

2. The port for the provider network router gateway (qg) in the router
namespace (qrouter) must use the MTU of the physical network. For
example, if the physical network has a 9000 MTU, the port must have a
9000 MTU. If the provider network uses an overlay protocol, the MTU of
the port must also account for any overhead.

3. For networks using DHCP, the port in the DHCP namespace (qdhcp)
should use the MTU of the network on which it provides services
accounting for any overlay protocol overhead.

4. The Linux bridge that implements security groups on the compute node
and all ports on it must use the MTU of the physical network accounting
for any overlay protocol overhead.

[1] http://lists.openstack.org/pipermail/openstack-
dev/2016-January/084241.html

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: ovs

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1542475

Title:
  MTU concerns for the Open vSwitch agent

Status in neutron:
  New

Bug description:
  I ran some experiments with the Open vSwitch (OVS) agent [1] to
  determine the source of MTU problems and offer a potential solution.
  The environment for these experiments contains the following items:

  1) A physical (underlying) network supporting MTU of 1500 or 9000 bytes.
  2) One controller node running the neutron server, OVS agent, L3 agent, DHCP agent, metadata agent, and OVS provider network bridge br-ex.
  3) One compute node running the Open vSwitch agent.
  4) A neutron provider/public network.
  5) A neutron self-service/private network.
  6) A neutron router between the provider and self-service networks.
  7) The self-service network uses the VXLAN protocol with IPv4 endpoints which adds 50 bytes of overhead.
  8) An instance on the self-service network with a floating IP address from an allocation pool on the provider network.

  Background:

  1. Interfaces (or ports) on OVS bridges such as those for overlay
  network tunnels appear to use an arbitrarily large MTU. Thus, OVS
  bridges and tunnel interfaces somewhat inherit the MTU of physical
  network interfaces. For example, if OVS uses the IP address of eth0
  for a tunnel overlay network endpoint and eth0 has a 1500 MTU, the
  tunnel interface can only send packets with a payload of up to 1500
  bytes including overlay protocol overhead.

  2. OVS creates interfaces (ports) in the host namespace and moves them
  to the appropriate namespace(s) rather than creating veth pairs
  between namespaces.

  3. For Linux bridge devices such as those on the compute node that
  implement security groups, Linux assumes a 1500 MTU and changes the
  MTU to the lowest MTU of any port on the bridge. For example, a bridge
  without ports has a 1500 MTU. If eth0 has a 9000 MTU and you add it as
  a port on the bridge, the bridge changes to a 9000 MTU. If eth1 has a
  1500 MTU and you add it as a port on the bridge, the bridge changes to
  a 1500 MTU.

  4. Only devices that operate at layer-3 can participate in path MTU
  discovery (PMTUD). Therefore, a change of MTU in a layer-2 device such
  as a bridge or veth pair causes that device to discard packets larger
  than the smallest MTU.

  Observations:

  1. For any physical network MTU, the port for the self-service network
  router interface (qr) in the router namespace (qrouter) has a 1500
  MTU. Background item (2) prevents a MTU disparity at layer-2 between
  the router namespace and OVS bridge br-int. If a packet from the
  provider network to the instance has a payload larger than 1500 bytes,
  the router can send an ICMP message to the source telling it to use a
  1500 MTU. However, the correct MTU for a private network using the
  VXLAN overlay protocol should account for 50 bytes of overhead. Thus,
  OVS fragments the packet over the tunnel and reassembles it on the
  compute node containing the instance.

  2. For a physical network MTU larger than 1500, the port for the
  provider network router gateway (qg) in the router namespace (qrouter)
  has a 1500 MTU. Background item (2) prevents a MTU disparity between
  the router namespace and OVS provider network bridge br-ex. If a
  packet from the provider network to the instance has a payload larger
  than 1500 bytes, the router can send an ICMP messages to the source
  telling it to use a 1500 MTU regardless of the private network overlay
  protocol. Thus, the agent cannot realize a physical network MTU larger
  than 1500.

  3. If a provider or private network uses DHCP, the port in the DHCP
  namespace has a 1500 MTU for any physical network MTU.

  4. The Linux bridge that implements security groups on the compute
  node lacks any ports on physical network interfaces. Background item
  (3) causes the bridge to assume a 1500 MTU. Nova actually manages this
  bridge and creates a veth pair between it and the Open vSwitch bridge
  br-int. Both ends of the veth pair have a 1500 MTU. Background item
  (1) indicates that the OVS bridge br-int could have a larger MTU.
  Thus, OVS discards packets inbound to instances with a payload larger
  than 1500 bytes.

  5. Instances must use a MTU value the accounts for overlay protocol
  overhead. Neutron currently offers a way to provide a correct value
  via DHCP. However, considering observation item (4), providing a MTU
  value larger than 1500 causes a disparity at layer-2 between the VM
  and tap interface port on the Linux bridge that implements security
  groups on the compute node. Thus, the bridge discards packets outbound
  from instances with a payload larger than 1500 bytes.

  6. The nova 'network_device_mtu' option controls the MTU of all
  devices that it manages in observation items (4) and (5). For example,
  using a value of 9000 causes the bridge, veth pair, and tap to have a
  9000 MTU. Combining this option with providing the correct value to
  instances via DHCP essentially resolves MTU problems on compute nodes.

  Potential solution:

  1. The port for the self-service network router interface (qr) in the
  router namespace (qrouter) must use the MTU of the physical network
  accounting for any overlay protocol overhead. For example, if the
  physical network has a 9000 MTU and the private network uses the VXLAN
  overlay protocol, the port must have a 8950 MTU.

  2. The port for the provider network router gateway (qg) in the router
  namespace (qrouter) must use the MTU of the physical network. For
  example, if the physical network has a 9000 MTU, the port must have a
  9000 MTU. If the provider network uses an overlay protocol, the MTU of
  the port must also account for any overhead.

  3. For networks using DHCP, the port in the DHCP namespace (qdhcp)
  should use the MTU of the network on which it provides services
  accounting for any overlay protocol overhead.

  4. The Linux bridge that implements security groups on the compute
  node and all ports on it must use the MTU of the physical network
  accounting for any overlay protocol overhead.

  [1] http://lists.openstack.org/pipermail/openstack-
  dev/2016-January/084241.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1542475/+subscriptions


Follow ups