← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1179223] Re: Retired GRE and VXLAN tunnels persists in neutron db

 

** Changed in: neutron
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1179223

Title:
  Retired GRE and VXLAN tunnels persists in neutron db

Status in OpenStack Neutron (virtual network service):
  Fix Released

Bug description:
  Setup is multi-node, with per-tenant routers and gre or vxlan tunneling,
  ovs or ML2, both affected.

  SYMPTOM:

  VM's are available on the external network for about 1-2 minutes,
  after which point the connection times out and cannot be re-
  established unless traffic is generated from the VM console.  VMs with
  dhcp interface settings will periodically and temporarily come back on
  line after requesting new leases.

  When I attempt to ping from the external network, I can trace the
  traffic all the way to the tap interface on the compute node, where
  the VM responds to the arp request sent by the tenant router (which is
  on the separate network node).  However, this arp reply never makes it
  back to the tenant router.  It seems to die at the GRE terminus at
  bridge br-tun.

  CAUSE:

  * I have a three nics on my network node.  The VM traffic goes out the
  1st nic on 192.168.239.99/24 to the other compute nodes, while
  management traffic goes out the 2nd nic on 192.168.241.99. The 3rd nic
  is external and has no IP.

  * I have four GRE endpoints on the VM network, one at the network node
  (192.168.239.99) and three on compute nodes
  (192.168.239.{110,114,115}), all with IDs 2-5.

  * I have a fifth GRE endpoint with id 1 to 192.168.241.99, the network
  node's management interface, on each of the compute nodes.  This was
  the first tunnel created when I deployed the network node because that
  is how I set the remote_ip in the ovs plugin ini.  I corrected the
  setting later, but the 192.168.241.99 endpoint persists:

  mysql> select * from ovs_tunnel_endpoints;
  +-----------------+----+
  | ip_address      | id |
  +-----------------+----+
  | 192.168.239.110 |  3 |
  | 192.168.239.114 |  4 |
  | 192.168.239.115 |  5 |
  | 192.168.239.99  |  2 |
  | 192.168.241.99  |  1 |   <======== HERE
  +-----------------+----+
  5 rows in set (0.00 sec)

  * Thus, after plugin restarts or reboots, this endpoint is re-created
  every time.

  * The effect is that traffic from the VM has two possible flows from
  which to make a routing/switching decision.  I was unable to determine
  how this decision is made, but obviously this is not a working
  configuration.  Traffic the originates from the VM always seems to use
  the correct flow initially, but traffic which originates from the
  network node is never returned via the right flow unless the
  connection has been active within the previous 1-2 minutes.  In both
  cases, successful connections timeout after 1-2 minutes of inactivity.

  SOLUTION:

  mysql> delete from ovs_tunnel_endpoints where id = 1;
  Query OK, 1 row affected (0.00 sec)

  mysql> select * from ovs_tunnel_endpoints;
  +-----------------+----+
  | ip_address      | id |
  +-----------------+----+
  | 192.168.239.110 |  3 |
  | 192.168.239.114 |  4 |
  | 192.168.239.115 |  5 |
  | 192.168.239.99  |  2 |
  +-----------------+----+
  4 rows in set (0.00 sec)

  * After doing that, I simply restarted the quantum ovs agents on the
  network and compute nodes.  The old GRE tunnel is not re-created.
  Thereafter, VM network traffic to and from the external network
  proceeds without incident.

  * Should these tables be cleaned up as well, I wonder:

  mysql> select * from ovs_network_bindings;
  +--------------------------------------+--------------+------------------+-----------------+
  | network_id                           | network_type | physical_network | segmentation_id |
  +--------------------------------------+--------------+------------------+-----------------+
  | 4e8aacca-8b38-40ac-a628-18cac3168fe6 | gre          | NULL             |               2 |
  | af224f3f-8de6-4e0d-b043-6bcd5cb014c5 | gre          | NULL             |               1 |
  +--------------------------------------+--------------+------------------+-----------------+
  2 rows in set (0.00 sec)

  mysql> select * from ovs_tunnel_allocations where allocated != 0;
  +-----------+-----------+
  | tunnel_id | allocated |
  +-----------+-----------+
  |         1 |         1 |
  |         2 |         1 |
  +-----------+-----------+
  2 rows in set (0.00 sec)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1179223/+subscriptions