yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #75340
[Bug 1799124] [NEW] Path MTU discovery fails for VMs with Floating IP behind DVR routers
Public bug reported:
Tenant VMs using an overlay network with an MTU <1500 and less than the
MTU of the external network are unable to negotiate MTU using Path MTU
discovery.
In most cases, since the instance MTU is configured by DHCP, direct
instance traffic is not affected however if the VM acts as a router for
other traffic (e.g. to bridge for Docker, LXD, Libvirt, etc) that have
the MTU set to 1500 (which is the default in most cases) then they rely
on Path MTU discovery to discover the 1450 MTU.
On normal routers and DVR routers where the VM does not have a floating
IP (and thus is routed via a centralized node), this works as expected.
However on DVR routers where the VM has a Floating IP (and thus traffic
is routed directly from the compute node) this fails. When a packet
comes from the external network towards the VM with a size larger than
the overlay network's MTU, the packet is dropped and no ICMP too large
fragmentation required response is received by the external host. This
prevents Path MTU discovery from working to fix the connection, the
result is that most TCP connections will stall if they attempt to send
more than 1500 bytes, e.g. a simple HTTP download.
My diagnosis is that the qrouter namespace on the compute host has no
default route. It has a default route in the alternative routing table
(16) used for traffic matching an "ip rule" which selects all traffic
being sent from the VM subnet but there is no default route in the
global default routing table.
I have not 100% confirmed this part, however, my understanding is that
since there is no global default route the kernel is unable to select a
source IP for the ICMP error. Additionally, even if it did somehow
select a source IP, the appropriate default route appears to be via the
RFP interface on the 169.254.0.0/16 subnet back to the FIP namespace
which would not match the rule for traffic from the VM subnet to use the
alternate routing table anyway.
In testing, if I add a default route through the rfp interface then ICMP
errors are sent and Path MTU discovery successfully works, allowing TCP
connections to work.
root@maas-node02:~# ip netns exec qrouter-1752c73a-be9f-4326-97cc-99dbe0988b3c ip r
103.245.215.0/28 dev qr-ec03268e-fb proto kernel scope link src 103.245.215.1
169.254.106.114/31 dev rfp-1752c73a-b proto kernel scope link src 169.254.106.114
root@maas-node02:~# ip -n qrouter-1752c73a-be9f-4326-97cc-99dbe0988b3c route show table 16
default via 169.254.106.115 dev rfp-1752c73a-b
root@maas-node02:~# ip -n qrouter-1752c73a-be9f-4326-97cc-99dbe0988b3c
route add default via 169.254.106.115 dev rfp-1752c73a-b
It's not clear to me if there is an intentional reason not to install a
default route here, particularly since such a route exists for non-DVR
routers. I would appreciate input from anyone who knows if this was an
intentional design decision or simply oversight.
= Steps to reproduce =
(1) Deploy a cloud with DVR and global-physnet-mtu=1500
(2) Create an overlay tenant network (MTU: 1450), VLAN/flat external network (MTU: 1500), router.
(3) Deploy an Ubuntu 16.04 container
(4) Verify that a large download works; "wget http://archive.ubuntu.com/ubuntu-releases/18.04.1/ubuntu-18.04.1-live-server-amd64.iso.zsync"
(5) Configure LXD to use a private subnet and NAT; "dpkg-reconfigure -pmedium lxd" - you can just hit yes and accept the defaults bascially
(6) Create an lxd image, "lxc launch ubuntu:16.04 test", then test a download
(7) lxc exec test "wget http://archive.ubuntu.com/ubuntu-releases/18.04.1/ubuntu-18.04.1-live-server-amd64.iso.zsync"
An alternative simple test to using LXD/docker is to force the MTU of
the VM back to 1500. "ip link set eth0 mtu 1500" -- this same scenario
will fail with DVR and work without DVR.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1799124
Title:
Path MTU discovery fails for VMs with Floating IP behind DVR routers
Status in neutron:
New
Bug description:
Tenant VMs using an overlay network with an MTU <1500 and less than
the MTU of the external network are unable to negotiate MTU using Path
MTU discovery.
In most cases, since the instance MTU is configured by DHCP, direct
instance traffic is not affected however if the VM acts as a router
for other traffic (e.g. to bridge for Docker, LXD, Libvirt, etc) that
have the MTU set to 1500 (which is the default in most cases) then
they rely on Path MTU discovery to discover the 1450 MTU.
On normal routers and DVR routers where the VM does not have a
floating IP (and thus is routed via a centralized node), this works as
expected.
However on DVR routers where the VM has a Floating IP (and thus
traffic is routed directly from the compute node) this fails. When a
packet comes from the external network towards the VM with a size
larger than the overlay network's MTU, the packet is dropped and no
ICMP too large fragmentation required response is received by the
external host. This prevents Path MTU discovery from working to fix
the connection, the result is that most TCP connections will stall if
they attempt to send more than 1500 bytes, e.g. a simple HTTP
download.
My diagnosis is that the qrouter namespace on the compute host has no
default route. It has a default route in the alternative routing table
(16) used for traffic matching an "ip rule" which selects all traffic
being sent from the VM subnet but there is no default route in the
global default routing table.
I have not 100% confirmed this part, however, my understanding is that
since there is no global default route the kernel is unable to select
a source IP for the ICMP error. Additionally, even if it did somehow
select a source IP, the appropriate default route appears to be via
the RFP interface on the 169.254.0.0/16 subnet back to the FIP
namespace which would not match the rule for traffic from the VM
subnet to use the alternate routing table anyway.
In testing, if I add a default route through the rfp interface then
ICMP errors are sent and Path MTU discovery successfully works,
allowing TCP connections to work.
root@maas-node02:~# ip netns exec qrouter-1752c73a-be9f-4326-97cc-99dbe0988b3c ip r
103.245.215.0/28 dev qr-ec03268e-fb proto kernel scope link src 103.245.215.1
169.254.106.114/31 dev rfp-1752c73a-b proto kernel scope link src 169.254.106.114
root@maas-node02:~# ip -n qrouter-1752c73a-be9f-4326-97cc-99dbe0988b3c route show table 16
default via 169.254.106.115 dev rfp-1752c73a-b
root@maas-node02:~# ip -n qrouter-1752c73a-be9f-4326-97cc-99dbe0988b3c
route add default via 169.254.106.115 dev rfp-1752c73a-b
It's not clear to me if there is an intentional reason not to install
a default route here, particularly since such a route exists for non-
DVR routers. I would appreciate input from anyone who knows if this
was an intentional design decision or simply oversight.
= Steps to reproduce =
(1) Deploy a cloud with DVR and global-physnet-mtu=1500
(2) Create an overlay tenant network (MTU: 1450), VLAN/flat external network (MTU: 1500), router.
(3) Deploy an Ubuntu 16.04 container
(4) Verify that a large download works; "wget http://archive.ubuntu.com/ubuntu-releases/18.04.1/ubuntu-18.04.1-live-server-amd64.iso.zsync"
(5) Configure LXD to use a private subnet and NAT; "dpkg-reconfigure -pmedium lxd" - you can just hit yes and accept the defaults bascially
(6) Create an lxd image, "lxc launch ubuntu:16.04 test", then test a download
(7) lxc exec test "wget http://archive.ubuntu.com/ubuntu-releases/18.04.1/ubuntu-18.04.1-live-server-amd64.iso.zsync"
An alternative simple test to using LXD/docker is to force the MTU of
the VM back to 1500. "ip link set eth0 mtu 1500" -- this same scenario
will fail with DVR and work without DVR.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1799124/+subscriptions
Follow ups