← Back to team overview

openstack team mailing list archive

Re: [QUANTUM] (Bug ?) L3 routing not correctly fragmenting packets ?


Okay. I think I got the reason why it's not working with OVS/GRE contrary to FlatDHCP nova-network. So, as per http://www.cisco.com/en/US/tech/tk827/tk369/technologies_white_paper09186a00800d6979.shtml , GRE encapsulation protocol can add up to 34 bytes to the IP datagram (meaning the TCP segment is only 1456 bytes if MTU set to 1500). When the packet is about 1500 bytes, then it should fragment to keep the 1500-byte size of the reply (including GRE encap then).

Unfortunaly, due to security purpose, the ICMP packet "type 3/code 4" (frag. needed) can't be reached to the X.X.X.X backend as this backend is denying any ICMP request (firewall). As a consequence, PathMTU is failing and packets still retransmited with 1500-byte size again and again...

As said on my first post, the only workaround I found is to modify *all* my VMs with MTU set to 1454 (don't know why there is a 2-bytes overhead compared to the 1456-byte I told above), including my Windows VMs which is not a cool stuff (modifying a registry key and reboot the VM. Yes, you aren't dreaming. This is the way for Windows-based machines to modify MTUs...)

Do you know if any cool idea would prevent to modify VMs, and only do things on the network node ?

My TCP/IP knowledge is quite at its limits, so any idea is great for me.


(BTW, maybe my explanation is absolutely wrong, and GRE is not responsible of the 36-byte overhead. If yes, please accept my apologies, any other clarification would be great).

Le 11/03/2013 10:07, Sylvain Bauza a écrit :
I also forgot to mention: I'm using a typical Openvswitch setup with GRE encapsulation.
I can't proof, but would GRE not able to work with PathMTU ?


Le 11/03/2013 09:40, Sylvain Bauza a écrit :
Hi Rick, reply inline.

Le 08/03/2013 20:27, Rick Jones a écrit :
On 03/08/2013 09:55 AM, Aaron Rosen wrote:
Hi Sylvain,

This seems very odd to me. The reason this should happen is if your
client is sending packets with the DF (don't fragment) bit set in the
TCP header of the packets you are sending. I'd confirm that your
version of 'curl' is doing this (which it should definitely not do!).

Why shouldn't a TCP connection initiated by curl (or anything else) have Path MTU discovery enabled? (ie the DF bit set in the IP datagrams carrying the TCP segments)

[SBA] Thanks for the explanation of the DF flag
What should happen is the router should fragment the packets for you
and if a fragment is lost TCP will just re-transmit the full packet
again and things should eventually work....

Here I thought all the IETF demigods considered IP Fragmentation 'To Be Avoided (tm)' - hence the creation of Path MTU discovery in the first place. :)

FWIW, in the IPv6 world, routers do not fragment. That implies either functioning PathMTU discovery, or lowest common MTU...


On Fri, Mar 8, 2013 at 9:08 AM, Sylvain Bauza
<sylvain.bauza@xxxxxxxxxxxx> wrote:

I recently observed a strange behaviour with L3 Quantum routing (Openvswitch
setup with Provider Router). A simple curl to an external website is
sometimes failing due to packet size  : > X.X.X.X: ICMP unreachable - need to frag
(mtu 1454), length 556
IP (tos 0x0, ttl 48, id 25918, offset 0, flags [DF], proto TCP (6),
length 1500)

Why is the ICMP Destination Unreachable datagram being sent back so large? I would have expected it to be rather smaller - an Ethernet, IP and ICMP header, and then the original IP header and something like 8 bytes or so of the original IP datagram's payload.

I take it that ICMP is not getting back to the original sender? Or is being ignored?

[SBA] I take the point. That means that PathMTU is not working for my Quantum installation. I also had a Nova-network (FlatDHCP mode) and I didn't noticed the issue. So, I assume something is wrong with my config.

Only changing the VM MTU to 1454 does the trick ('ifconfig eth0 mtu 1454').

For info, is the floating IP bound to (private IP).

I suppose if doesn't explicitly know about it might indeed ignore the ICMP message. Assuming it isn't getting un-NATted on the way back.

[SBA] This *is* un-NAT'd on the way back. By tcpdump'ing with the '-i any' interface, I can see the DNAT mapping on the way back :

Do you have any idea on what I should fix (or at least workaround) to have PathMTU working ? By the way, I did check and both client ( and server (X.X.X.X) have MTU set to 1500. I can't understand why the server is asking for a fragment size of 1454.


Follow ups