yahoo-eng-team team mailing list archive

Thread
Date

[Bug 2028795] [NEW] Restarting OVS with DVR creates a network loop

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Jakub Libosvar <2028795@xxxxxxxxxxxxxxxxxx>
Date: Wed, 26 Jul 2023 17:57:21 -0000
Reply-to: Bug 2028795 <2028795@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx

Public bug reported:

restarting OVS agent with DVR enabled creates a network loop between the
external network and a tunneling network for a very short period of
time. This causes big problems when 2 agents are restarted at the same
time.

Steps to reproduce:
1) Have ml2/ovs with DVR enabled
2) Have a VM with a FIP on compute node A
3) Have a gw port for snat traffic on network node B
4) ping the FIP with -i 0.1 option to send icmp request every 0.1 seconds
5) restart OVS agents on both compute node A and network node B at the same time

Now the replies for the FIP traffic gets dropped on the compute node A
for about 3-5 minutes because the loop causes that local OVS on compute
node A learns that GW port MAC is on the tunneling interface. All reply
traffic uses that MAC in its destination field and normal OVS action no
longer floods such traffic but as per its FDB entry forwards it to the
patch port between br-int and br-tun, where it's dropped until the FDB
entry expires.

** Affects: neutron
     Importance: Undecided
     Assignee: Jakub Libosvar (libosvar)
         Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2028795

Title:
  Restarting OVS with DVR creates a network loop

Status in neutron:
  New

Bug description:
  restarting OVS agent with DVR enabled creates a network loop between
  the external network and a tunneling network for a very short period
  of time. This causes big problems when 2 agents are restarted at the
  same time.

  Steps to reproduce:
  1) Have ml2/ovs with DVR enabled
  2) Have a VM with a FIP on compute node A
  3) Have a gw port for snat traffic on network node B
  4) ping the FIP with -i 0.1 option to send icmp request every 0.1 seconds
  5) restart OVS agents on both compute node A and network node B at the same time

  Now the replies for the FIP traffic gets dropped on the compute node A
  for about 3-5 minutes because the loop causes that local OVS on
  compute node A learns that GW port MAC is on the tunneling interface.
  All reply traffic uses that MAC in its destination field and normal
  OVS action no longer floods such traffic but as per its FDB entry
  forwards it to the patch port between br-int and br-tun, where it's
  dropped until the FDB entry expires.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2028795/+subscriptions

Follow ups

[Bug 2028795] Re: Restarting OVS with DVR creates a network loop
From: Jakub Libosvar, 2024-04-16