← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1622017] Re: OVS agent is not removing VLAN tags before tunnels when configured with native OF interface

 

Reviewed:  https://review.openstack.org/368553
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4361f7543f984cf5f09c0c7070ac6b0f22f3b6b1
Submitter: Jenkins
Branch:    master

commit 4361f7543f984cf5f09c0c7070ac6b0f22f3b6b1
Author: IWAMOTO Toshihiro <iwamoto@xxxxxxxxxxxxx>
Date:   Mon Sep 12 14:36:18 2016 +0900

    of_interface: Use vlan_tci instead of vlan_vid
    
    To pop VLAN tags in learn action generated flows, vlan_tci should
    be used instead of vlan_vid.  Otherwise, VLAN tags with VID=0 are
    left.
    
    Change-Id: Ie38ab860424f6e2e2448abac82c428dae3a8a544
    Closes-bug: #1622017


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1622017

Title:
  OVS agent is not removing VLAN tags before tunnels when configured
  with native OF interface

Status in neutron:
  Fix Released

Bug description:
  In investigating an MTU issue, an accounted-for overhead of 4 bytes
  was discovered. A spurious 802.1q header was discovered using tcpdump
  when attempting to connect to a guest via floating IP. The tenant
  network type is VXLAN and the VXLAN endpoints themselves are on a
  VLAN. This issue effectively breaks communication with guests via
  floating ip for some system configurations.

  The test system is configured with a default global_physnet_mtu of
  1500 and inspection of the router namespace confirms that the tenant
  network's router interface has been automatically configured to with
  an MTU of 1450. Ping was used to test. e.g.  ping  -M do -s 1422
  192.0.2.58 (1422 is the maximum that should fit in the 1450 MTU
  without fragmentation).

  With the system configured as described, "ping -s 1420 <floating ip>"
  fails.

  tcpdump on the controller reveals:

  root@overcloud-controller-0 heat-admin]# tcpdump -vvv  -e -i any icmp
  tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
  18:32:49.163223   P 52:54:00:01:09:3c (oui Unknown) ethertype IPv4 (0x0800), length 1464: (tos 0x0, ttl 64, id 37535, offset 0, flags [DF], proto ICMP (1), length 1448)
      192.0.2.1 > 192.0.2.58: ICMP echo request, id 16083, seq 1, length 1428
  18:32:49.163340  In 00:00:00:00:00:00 (oui Ethernet) ethertype IPv4 (0x0800), length 592: (tos 0xc0, ttl 64, id 4395, offset 0, flags [none], proto ICMP (1), length 576)
      overcloud-controller-0.tenant.localdomain > overcloud-controller-0.tenant.localdomain: ICMP overcloud-novacompute-0.tenant.localdomain unreachable - need to frag (mtu 1500), length 556
          (tos 0x0, ttl 64, id 22077, offset 0, flags [DF], proto UDP (17), length 1502)
      overcloud-controller-0.tenant.localdomain.51706 > overcloud-novacompute-0.tenant.localdomain.4789: [no cksum] VXLAN, flags [I] (0x08), vni 36

  
  Adjusting the ping size to allow for a 4 byte header (e.g. ping -s 1418 <floating ip>) succeeds.

  Using an alternate tcpdump command to get information from the VXLAN traffic, reveals unusual extra 802.1q header with a vlan ID of 0:
  [root@overcloud-controller-0 heat-admin]# tcpdump -vvv -n  -e -i any udp
  tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
  18:36:48.095985 Out 56:13:19:d8:af:27 ethertype IPv4 (0x0800), length 1516: (tos 0x0, ttl 64, id 22088, offset 0, flags [DF], proto UDP (17), length 1500)
      172.16.0.5.51706 > 172.16.0.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 36
  fa:16:3e:99:37:ce > fa:16:3e:06:65:6f, ethertype 802.1Q (0x8100), length 1464: vlan 0, p 0, ethertype IPv4, (tos 0x0, ttl 63, id 37541, offset 0, flags [DF], proto ICMP (1), length 1446)
      192.0.2.1 > 192.168.2.101: ICMP echo request, id 16422, seq 1, length 1426
  18:36:48.097861   P ea:0c:37:f7:69:5e ethertype 802.1Q (0x8100), length 1520: vlan 50, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 22354, offset 0, flags [DF], proto UDP (17), length 1500)
      172.16.0.10.50337 > 172.16.0.5.4789: [no cksum] VXLAN, flags [I] (0x08), vni 36

  The flow table is similar to (this was taken from the compute node,
  not the controller but the br-tun flow tables follow the same form
  with only different values for local segment IDs)

  [root@overcloud-novacompute-0 ml2]# ovs-ofctl -O OpenFlow13 dump-flows br-tun
  OFPST_FLOW reply (OF1.3) (xid=0x2):
   cookie=0xb13175655506ca2e, duration=11.785s, table=0, n_packets=0, n_bytes=0, priority=1,in_port=1 actions=goto_table:2
   cookie=0xb13175655506ca2e, duration=10.955s, table=0, n_packets=0, n_bytes=0, priority=1,in_port=2 actions=goto_table:4
   cookie=0xb13175655506ca2e, duration=11.783s, table=0, n_packets=0, n_bytes=0, priority=0 actions=drop
   cookie=0xb13175655506ca2e, duration=11.781s, table=2, n_packets=0, n_bytes=0, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=goto_table:20
   cookie=0xb13175655506ca2e, duration=11.779s, table=2, n_packets=0, n_bytes=0, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=goto_table:22
   cookie=0xb13175655506ca2e, duration=11.778s, table=3, n_packets=0, n_bytes=0, priority=0 actions=drop
   cookie=0xb13175655506ca2e, duration=10.677s, table=4, n_packets=0, n_bytes=0, priority=1,tun_id=0x24 actions=push_vlan:0x8100,set_field:4097->vlan_vid,goto_table:10
   cookie=0xb13175655506ca2e, duration=11.777s, table=4, n_packets=0, n_bytes=0, priority=0 actions=drop
   cookie=0xb13175655506ca2e, duration=11.776s, table=6, n_packets=0, n_bytes=0, priority=0 actions=drop
   cookie=0xb13175655506ca2e, duration=11.774s, table=10, n_packets=0, n_bytes=0, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xb13175655506ca2e,OXM_OF_VLAN_VID[],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->OXM_OF_VLAN_VID[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:1
   cookie=0xb13175655506ca2e, duration=11.772s, table=20, n_packets=0, n_bytes=0, priority=0 actions=goto_table:22
   cookie=0xb13175655506ca2e, duration=10.680s, table=22, n_packets=0, n_bytes=0, priority=1,dl_vlan=1 actions=pop_vlan,set_field:0x24->tun_id,output:2
   cookie=0xb13175655506ca2e, duration=11.771s, table=22, n_packets=0, n_bytes=0, priority=0 actions=drop

  On a hunch, the same trials were performed with the openvswitch agents
  on the controller and compute nodes configured to use the ovs-ofctl OF
  interface.  ping -s 1422 192.0.2.58 as well as ssh to the guests and
  copies of large amount of data are now possible. The same tcpdump
  command shows that the extra 802.1q information is not present:

  #with ofctl instead of native
  [root@overcloud-controller-0 ml2]# tcpdump -vvv -n -e -i any udp 
  tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
  19:10:31.570425 Out 56:13:19:d8:af:27 ethertype IPv4 (0x0800), length 1512: (tos 0x0, ttl 64, id 22104, offset 0, flags [DF], proto UDP (17), length 1496)
      172.16.0.5.51706 > 172.16.0.10.4789: [no cksum] VXLAN, flags [I] (0x08), vni 36
  fa:16:3e:99:37:ce > fa:16:3e:06:65:6f, ethertype IPv4 (0x0800), length 1460: (tos 0x0, ttl 63, id 37549, offset 0, flags [DF], proto ICMP (1), length 1446)
      192.0.2.1 > 192.168.2.101: ICMP echo request, id 19062, seq 1, length 1426
  19:10:31.572143   P ea:0c:37:f7:69:5e ethertype 802.1Q (0x8100), length 1520: vlan 50, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 22370, offset 0, flags [DF], proto UDP (17), length 1500)
      172.16.0.10.50337 > 172.16.0.5.4789: [no cksum] VXLAN, flags [I] (0x08), vni 36

  The flow table is also different, using strip_vlan instead of pop_vlan
  (as well as other obvious differences)

  [root@overcloud-novacompute-0 ml2]# ovs-ofctl dump-flows br-tun
  NXST_FLOW reply (xid=0x4):
   cookie=0xb4814c0ff5ea6fd4, duration=2095.101s, table=0, n_packets=115156, n_bytes=8744100, idle_age=546, priority=1,in_port=1 actions=resubmit(,2)
   cookie=0xb4814c0ff5ea6fd4, duration=2094.475s, table=0, n_packets=346419, n_bytes=274503223, idle_age=546, priority=1,in_port=2 actions=resubmit(,4)
   cookie=0xb4814c0ff5ea6fd4, duration=2095.100s, table=0, n_packets=0, n_bytes=0, idle_age=2095, priority=0 actions=drop
   cookie=0xb4814c0ff5ea6fd4, duration=2095.099s, table=2, n_packets=115155, n_bytes=8744058, idle_age=546, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
   cookie=0xb4814c0ff5ea6fd4, duration=2095.099s, table=2, n_packets=1, n_bytes=42, idle_age=1263, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
   cookie=0xb4814c0ff5ea6fd4, duration=2095.098s, table=3, n_packets=0, n_bytes=0, idle_age=2095, priority=0 actions=drop
   cookie=0xb4814c0ff5ea6fd4, duration=2094.227s, table=4, n_packets=346419, n_bytes=274503223, idle_age=546, priority=1,tun_id=0x24 actions=mod_vlan_vid:1,resubmit(,10)
   cookie=0xb4814c0ff5ea6fd4, duration=2095.097s, table=4, n_packets=0, n_bytes=0, idle_age=2095, priority=0 actions=drop
   cookie=0xb4814c0ff5ea6fd4, duration=2095.097s, table=6, n_packets=0, n_bytes=0, idle_age=2095, priority=0 actions=drop
   cookie=0xb4814c0ff5ea6fd4, duration=2095.096s, table=10, n_packets=346419, n_bytes=274503223, idle_age=546, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xb4814c0ff5ea6fd4,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1
   cookie=0xb4814c0ff5ea6fd4, duration=2095.096s, table=20, n_packets=0, n_bytes=0, idle_age=2095, priority=0 actions=resubmit(,22)
   cookie=0xb4814c0ff5ea6fd4, duration=2094.235s, table=22, n_packets=1, n_bytes=42, idle_age=1263, dl_vlan=1 actions=strip_vlan,set_tunnel:0x24,output:2
   cookie=0xb4814c0ff5ea6fd4, duration=2095.086s, table=22, n_packets=0, n_bytes=0, idle_age=2095, priority=0 actions=drop

  System details follow:

  System info: CentOS Linux release 7.2.1511 (Core) 
  Kernel version: 3.10.0-327.28.3.el7.x86_6
  System is a tripleo deployment using a network isolation type network environment (see docs for details)
  Deployment command line:
  openstack overcloud deploy --templates ./tripleo-heat-templates 
     -e ~/tripleo-heat-templates/environments/network-isolation.yaml 
     -e ~/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml 
     -e ~/for_net_isolation.yaml 
  All templates "stock" except for last, contains:
  parameter_defaults:
    EC2MetadataIp: 192.0.2.1
    ControlPlaneDefaultRoute: 192.0.2.1

  
  OpenStack packages

  openvswitch.x86_64               2.5.0-2.el7            @delorean-newton-testing
  openstack-neutron-openvswitch.noarch
                                   1:9.0.0-0.20160907193737.dc6508a.el7.centos
                                                          @delorean    
  [root@overcloud-controller-0 ~]# ovs-vsctl --version
  ovs-vsctl (Open vSwitch) 2.5.0
  Compiled Mar 18 2016 15:00:11
  DB Schema 7.12.1

  [root@overcloud-controller-0 ~]# ovs-ofctl --version
  ovs-ofctl (Open vSwitch) 2.5.0
  Compiled Mar 18 2016 15:00:11
  OpenFlow versions 0x1:0x4

  python-ryu-common.noarch         4.3-2.el7              @delorean-newton-testing
  python2-ryu.noarch               4.3-2.el7              @delorean-newton-testing

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1622017/+subscriptions


References