← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1883071] [NEW] Fix flows l2 population related on br-tun being cleaned after RabbitMQ cluster has experienced a network partition

 

Public bug reported:

Pre-conditions: RabbitMQ cluster has experienced a network partition,
then restart neutron-ovs-agent.

results: In normal, when the neutron-ovs-agent restarts, the method
add_fdb_entries will be called to refresh the l2 pop related
flows.However,after RabbitMQ cluster has experienced a network
partition, the agent can only receive part of rpc to call
add_fdb_entries to refresh the l2 pop related flows. Then those l2 pop
related flows whose cookie is old will be cleaned. However, these flows
are actually useful, and deleting them will affect the tenant traffic.

Our temporary solution is to change method cleanup_flows. The l2 pop
related flows mainly include table20, table21, and table 22, the flow
with lowest priority in them is resubmitted to table 22, so we only need
to ensure flows in table 22 exist. Before cleanup flows in table 22, we
dump all flows in it , then compare vlan_num of every flow with
LocalVLANMapping to judge this network is still in use, if not, cleanup
it. If this network is still in use, the flow related it in table 22
will not be cleaned until agent get rpc to refresh it.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1883071

Title:
  Fix flows l2 population related on br-tun being cleaned after RabbitMQ
  cluster has experienced a network partition

Status in neutron:
  New

Bug description:
  Pre-conditions: RabbitMQ cluster has experienced a network partition,
  then restart neutron-ovs-agent.

  results: In normal, when the neutron-ovs-agent restarts, the method
  add_fdb_entries will be called to refresh the l2 pop related
  flows.However,after RabbitMQ cluster has experienced a network
  partition, the agent can only receive part of rpc to call
  add_fdb_entries to refresh the l2 pop related flows. Then those l2 pop
  related flows whose cookie is old will be cleaned. However, these
  flows are actually useful, and deleting them will affect the tenant
  traffic.

  Our temporary solution is to change method cleanup_flows. The l2 pop
  related flows mainly include table20, table21, and table 22, the flow
  with lowest priority in them is resubmitted to table 22, so we only
  need to ensure flows in table 22 exist. Before cleanup flows in table
  22, we dump all flows in it , then compare vlan_num of every flow with
  LocalVLANMapping to judge this network is still in use, if not,
  cleanup it. If this network is still in use, the flow related it in
  table 22 will not be cleaned until agent get rpc to refresh it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1883071/+subscriptions


Follow ups