← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1860521] [NEW] L2 pop notifications are not reliable

 

Public bug reported:

Problem: lack of connectivity (e.g. vxlan tunnels, OVS flows) between
nodes/VMs in L2 segment due to partial RabbitMQ unavailability, RPC
message loss or agent failure on applying fdb entry updates.

Why: currently FDB entries are sent by neutron server to L2 agents one-
way (no feedback), thus agent has no way to detect if all required
tunnels/flows are built. On the other hand server has no way to detect
if all sent FDB entries were delivered and required flows were applied.
In case some messages are lost - only agent restart fixes possible
issues.

Way to address: new synchronization mechanism on L2 agent side, which
will periodically request net topology from server and match it to
actual config applied on the node, with applying missing parts.

Option 2: move from RPC fanouts and casts to RPC calls which guarantee
message delivery. Concerns: scalability, increased load on neutron
server.

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1860521

Title:
  L2 pop notifications are not reliable

Status in neutron:
  New

Bug description:
  Problem: lack of connectivity (e.g. vxlan tunnels, OVS flows) between
  nodes/VMs in L2 segment due to partial RabbitMQ unavailability, RPC
  message loss or agent failure on applying fdb entry updates.

  Why: currently FDB entries are sent by neutron server to L2 agents
  one-way (no feedback), thus agent has no way to detect if all required
  tunnels/flows are built. On the other hand server has no way to detect
  if all sent FDB entries were delivered and required flows were
  applied. In case some messages are lost - only agent restart fixes
  possible issues.

  Way to address: new synchronization mechanism on L2 agent side, which
  will periodically request net topology from server and match it to
  actual config applied on the node, with applying missing parts.

  Option 2: move from RPC fanouts and casts to RPC calls which guarantee
  message delivery. Concerns: scalability, increased load on neutron
  server.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1860521/+subscriptions


Follow ups