← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1385234] [NEW] OVS tunneling between multiple neutron nodes misconfigured if amqp is restarted

 

Public bug reported:

At completion of a deployment with multiple controllers, by observing
the gre tunnels created in OVS by the neutron ovs-agent, one will find
that some neutron nodes may miss the tunnels in between them.

This is due to ovs-agents getting disconnected from the rabbit cluster
without them noticing and as a result, being unable to receive updates
from other nodes or publish updates.

The disconnection may happen following a reconfig of a rabbit node, the
VIP moving over a different node, or even _during_ deployment due to
rabbit cluster configuration.

This was observed using Kombu 3.0.33 as well as 2.5.

Use of some aggressive (low) kernel keepalive probes interval seems to
improve the reliability but a more appropriate fix seems to be support
for heartbeat in oslo.messaging

** Affects: neutron
     Importance: Undecided
         Status: New

** Affects: oslo.messaging
     Importance: Undecided
         Status: New

** Affects: tripleo
     Importance: High
         Status: New

** Also affects: oslo.messaging
   Importance: Undecided
       Status: New

** Also affects: neutron
   Importance: Undecided
       Status: New

** Summary changed:

- OVS tunneling between multiple neutron nodes breaks if amqp is restarted
+ OVS tunneling between multiple neutron nodes misconfigured if amqp is restarted

** Description changed:

  At completion of a deployment with multiple controllers, by observing
  the gre tunnels created in OVS by the neutron ovs-agent, one will find
  that some neutron nodes may miss the tunnels in between them.
  
  This is due to ovs-agents getting disconnected from the rabbit cluster
  without them noticing and as a result, being unable to receive updates
  from other nodes or publish updates.
  
  The disconnection may happen following a reconfig of a rabbit node, the
  VIP moving over a different node, or even _during_ deployment due to
  rabbit cluster configuration.
  
+ This was observed using Kombu 3.0.33 as well as 2.5.
+ 
  Use of some aggressive (low) kernel keepalive probes interval seems to
  improve the reliability but a more appropriate fix seems to be support
  for heartbeat in oslo.messaging

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1385234

Title:
  OVS tunneling between multiple neutron nodes misconfigured if amqp is
  restarted

Status in OpenStack Neutron (virtual network service):
  New
Status in Messaging API for OpenStack:
  New
Status in tripleo - openstack on openstack:
  New

Bug description:
  At completion of a deployment with multiple controllers, by observing
  the gre tunnels created in OVS by the neutron ovs-agent, one will find
  that some neutron nodes may miss the tunnels in between them.

  This is due to ovs-agents getting disconnected from the rabbit cluster
  without them noticing and as a result, being unable to receive updates
  from other nodes or publish updates.

  The disconnection may happen following a reconfig of a rabbit node,
  the VIP moving over a different node, or even _during_ deployment due
  to rabbit cluster configuration.

  This was observed using Kombu 3.0.33 as well as 2.5.

  Use of some aggressive (low) kernel keepalive probes interval seems to
  improve the reliability but a more appropriate fix seems to be support
  for heartbeat in oslo.messaging

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1385234/+subscriptions


Follow ups

References