← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1267931] [NEW] neutron-l3-agent virtual router SNAT translation doesn't work for traffic happening during iptable rules setup (race condition)

 

Public bug reported:

I found a race condition that happens in the following situation:

 0) we use SNAT to connect an internal network to the external one
 1) A network node running neutron-l3-agent with actual traffic is rebooted
 2) While it starts again, an VM is sending traffic (ping is a simple case) to external network
 3) As it starts, it creates the virtual router qrouter-<ID> namespace, brings up the interfaces (ext+int),
     and setups the iptable rules.

 4) if traffic hits the rules, before the SNAT rule is set, the linux
    connection tracker won't ever toss those packets anymore by the
    SNAT rule (even if is set after). So it will result from the internal
    IP being forwarded "as is", untranslated,  into the external network.

 5) If you restart the ping in the VM (ping seq restarts to 0), it will
start working

 6) If you start a different ping while the first one is running, the new ping will work, the old will
     stay in that "limbo state" where it's untranslated.

 Aditional information:

  This is the normal condition, where a race condition didn't happen:    http://fpaste.org/67388/89372153/
  This is the abnormal condition, where the race condition happened:  http://fpaste.org/67389/38937224/ (note the last tcpdump source IP)

  This is the abnormal condition, where we started a new ping to a
different host:   http://fpaste.org/67393/93725511/ (there are two
tcpdumps in parallel)

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: condition ha iptables race

** Description changed:

- 
  I found a race condition that happens in the following situation:
  
-  0) we use SNAT to connect an internal network to the external one
-  1) A network node running neutron-l3-agent with actual traffic is rebooted
-  2) While it starts again, an VM is sending traffic (ping is a simple case) to external network
-  3) As it starts, it creates the virtual router qrouter-<ID> namespace, brings up the interfaces (ext+int),
-      and setups the iptable rules.
+  0) we use SNAT to connect an internal network to the external one
+  1) A network node running neutron-l3-agent with actual traffic is rebooted
+  2) While it starts again, an VM is sending traffic (ping is a simple case) to external network
+  3) As it starts, it creates the virtual router qrouter-<ID> namespace, brings up the interfaces (ext+int),
+      and setups the iptable rules.
  
-  4) if traffic hits the rules, before the SNAT rule is set, the linux connection tracker won't ever 
-       toss those packets anymore by the SNAT rule (even if is set after). So it will result from the
-       internal IP being forwarded "as is", untranslated,  into the external network.
+  4) if traffic hits the rules, before the SNAT rule is set, the linux
+     connection tracker won't ever toss those packets anymore by the 
+     SNAT rule (even if is set after). So it will result from the internal
+     IP being forwarded "as is", untranslated,  into the external network.
  
-  5) If you restart the ping in the VM (ping seq restarts to 0), it will start working
-  6) if you start a different ping while the first one is running, the new ping will work, the old will
-      stay in that "limbo state" where it's untranslated.
+  5) If you restart the ping in the VM (ping seq restarts to 0), it will
+ start working
  
-  Aditional information:
+  6) If you start a different ping while the first one is running, the new ping will work, the old will
+      stay in that "limbo state" where it's untranslated.
  
-   This is the normal condition, where a race condition didn't happen:    http://fpaste.org/67388/89372153/
-   This is the abnormal condition, where the race condition happened:  http://fpaste.org/67389/38937224/ (note the last tcpdump source IP)
+  Aditional information:
  
-   This is the abnormal condition, where we started a new ping to a
+   This is the normal condition, where a race condition didn't happen:    http://fpaste.org/67388/89372153/
+   This is the abnormal condition, where the race condition happened:  http://fpaste.org/67389/38937224/ (note the last tcpdump source IP)
+ 
+   This is the abnormal condition, where we started a new ping to a
  different host:   http://fpaste.org/67358/36578913/ (there are two
  tcpdumps in parallel)

** Description changed:

  I found a race condition that happens in the following situation:
  
   0) we use SNAT to connect an internal network to the external one
   1) A network node running neutron-l3-agent with actual traffic is rebooted
   2) While it starts again, an VM is sending traffic (ping is a simple case) to external network
   3) As it starts, it creates the virtual router qrouter-<ID> namespace, brings up the interfaces (ext+int),
       and setups the iptable rules.
  
   4) if traffic hits the rules, before the SNAT rule is set, the linux
-     connection tracker won't ever toss those packets anymore by the 
-     SNAT rule (even if is set after). So it will result from the internal
-     IP being forwarded "as is", untranslated,  into the external network.
+     connection tracker won't ever toss those packets anymore by the
+     SNAT rule (even if is set after). So it will result from the internal
+     IP being forwarded "as is", untranslated,  into the external network.
  
   5) If you restart the ping in the VM (ping seq restarts to 0), it will
  start working
  
   6) If you start a different ping while the first one is running, the new ping will work, the old will
       stay in that "limbo state" where it's untranslated.
  
   Aditional information:
  
    This is the normal condition, where a race condition didn't happen:    http://fpaste.org/67388/89372153/
    This is the abnormal condition, where the race condition happened:  http://fpaste.org/67389/38937224/ (note the last tcpdump source IP)
  
    This is the abnormal condition, where we started a new ping to a
- different host:   http://fpaste.org/67358/36578913/ (there are two
+ different host:   http://fpaste.org/67393/93725511/ (there are two
  tcpdumps in parallel)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1267931

Title:
  neutron-l3-agent virtual router SNAT translation doesn't work for
  traffic happening during iptable rules setup (race condition)

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  I found a race condition that happens in the following situation:

   0) we use SNAT to connect an internal network to the external one
   1) A network node running neutron-l3-agent with actual traffic is rebooted
   2) While it starts again, an VM is sending traffic (ping is a simple case) to external network
   3) As it starts, it creates the virtual router qrouter-<ID> namespace, brings up the interfaces (ext+int),
       and setups the iptable rules.

   4) if traffic hits the rules, before the SNAT rule is set, the linux
      connection tracker won't ever toss those packets anymore by the
      SNAT rule (even if is set after). So it will result from the internal
      IP being forwarded "as is", untranslated,  into the external network.

   5) If you restart the ping in the VM (ping seq restarts to 0), it
  will start working

   6) If you start a different ping while the first one is running, the new ping will work, the old will
       stay in that "limbo state" where it's untranslated.

   Aditional information:

    This is the normal condition, where a race condition didn't happen:    http://fpaste.org/67388/89372153/
    This is the abnormal condition, where the race condition happened:  http://fpaste.org/67389/38937224/ (note the last tcpdump source IP)

    This is the abnormal condition, where we started a new ping to a
  different host:   http://fpaste.org/67393/93725511/ (there are two
  tcpdumps in parallel)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1267931/+subscriptions


Follow ups

References