← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1629539] [NEW] Broken distributed virtual router

 

Public bug reported:

I wish I could come up with a smarter, more descriptive title for this,
but if someone can after reading my report, feel free to update it.

I installed my second controller the other day (because of resource
constraints, I run ALL my Openstack control services - APIs, Engines,
Servers etc, etc - _everything_ but 'nova-compute' and 'nova-console' -
on one physical host) and then one of my LBaaSv1 (haven't gotten around
to try enabling v2 again, last time I got some issues which was reported
elsewhere in the tracker) stopped working.

After almost a day trying to figure out why only one and how to fix it,
I realized it must be the _router_ not the load balancer that's at fault
(see below).

Broken LBaaSv1 VIP:          10.100.0.16/24
Broken LBaaSv1 Floating IP:  10.0.5.90/24
Working LBaaSv1 Floating IP: 10.0.4.190/24
Router VIF namespace:        10.0.5.100                (not sure exactly what this is, but for some reason it have 'stolen' the "GW functionality" (incoming) on the router from the .253 interfaces)
Router qrouter namespace:    10.0.4.253 + 10.0.5.253   (these are on the 'External Gateway' on the router and is supposed to be the routers GW)
Primary GW/FW/NAT:           eth1:192.168.69.1/24, eth2:10.0.4.254/24, eth2:10.0.5.254/24

=> ==========================================
=> From a physical host outside the OS network(s) (i.e. from the 192.168.69.0/24 network):

traceroute to 10.100.0.16 (10.100.0.16), 30 hops max, 60 byte packets   <= CORRECT
 1  192.168.69.1  0.088 ms  0.077 ms  0.064 ms
 2  10.0.4.253  0.262 ms  0.246 ms  0.258 ms
 3  10.100.0.16  2.365 ms  2.348 ms  2.310 ms

traceroute to 10.0.5.90 (10.0.5.90), 30 hops max, 60 byte packets       <= WRONG, LBaaSv1 don't work
 1  192.168.69.1  0.156 ms  0.138 ms  0.123 ms
 2  10.0.5.100  0.834 ms  0.863 ms  0.851 ms
 3  * * *
 4  10.0.5.90  1.487 ms  1.564 ms  1.561 ms

traceroute to 10.0.4.190 (10.0.4.190), 30 hops max, 60 byte packets     <= WRONG, but LBaaSv1 work
 1  192.168.69.1  0.130 ms  0.112 ms  0.097 ms
 2  10.0.5.100  1.595 ms  1.581 ms  1.568 ms
 3  * * *
 4  10.0.4.190  2.265 ms  2.262 ms  2.251 ms

=> ==========================================
=> From an instance (inside the 10.100.0.0/24 subnet - all ICMP open)

traceroute to 10.100.0.16 (10.100.0.16), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  *^C

PING 10.100.0.16 (10.100.0.16) 56(84) bytes of data.
64 bytes from 10.100.0.16: icmp_seq=1 ttl=64 time=1.32 ms
64 bytes from 10.100.0.16: icmp_seq=2 ttl=64 time=0.548 ms
64 bytes from 10.100.0.16: icmp_seq=3 ttl=64 time=0.589 ms
^C

PING 10.0.5.90 (10.0.5.90) 56(84) bytes of data.
64 bytes from 10.100.0.16: icmp_seq=1 ttl=64 time=1.02 ms
64 bytes from 10.0.5.90: icmp_seq=1 ttl=60 time=1.68 ms (DUP!)
^C

PING 10.0.4.190 (10.0.4.190) 56(84) bytes of data.
64 bytes from 10.100.0.4: icmp_seq=1 ttl=64 time=0.925 ms
64 bytes from 10.0.4.190: icmp_seq=1 ttl=60 time=467 ms (DUP!)
^C

=> ==========================================
=> The 'actual' problem

=> From a host on the 192.168.69.0/24 network
$ curl --insecure https://10.100.0.16:8140/
curl: (35) Unknown SSL protocol error in connection to 10.100.0.16:8140 <= FAIL, never reaches backend server
$ curl --insecure https://10.0.5.90:8140/
The environment must be purely alphanumeric, not ''                     <= Actually working

=> From an instance
$ curl --insecure https://10.100.0.16:8140/
The environment must be purely alphanumeric, not ''                     <= Actually working
$ curl --insecure https://10.0.5.90:8140/
curl: (35) Unknown SSL protocol error in connection to 10.0.5.90:8140   <= FAIL, never reaches backend server

Testing a connection to 10.0.4.190 with curl won't work - it's "ldaps"
on port 636. But doing a ldapsearch from 192.168.69.0/24 to that works,
but not from an instance. So that is broken as well, even though I
labeled it 'working' above :(. Just "broken" in a different way..

=> ==========================================
=> Relevant name spaces on the controllers:

=>
=> Primary Controller
=>

=> ip netns | sort
fip-cd30c1bb-3db6-488c-b448-6cb4454783be
qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7

=> fip-cd30c1bb-3db6-488c-b448-6cb4454783be
66: fg-38e452be-d4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    inet 10.0.5.100/24 brd 10.0.5.255 scope global fg-38e452be-d4

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.5.254      0.0.0.0         UG    0      0        0 fg-38e452be-d4
10.0.4.189      169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
10.0.4.190      169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
10.0.4.195      169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
10.0.5.0        0.0.0.0         255.255.255.0   U     0      0        0 fg-38e452be-d4
10.0.5.90       169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
10.0.5.92       169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
10.0.5.99       169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
169.254.106.114 0.0.0.0         255.255.255.254 U     0      0        0 fpr-4b3639a1-8

=> qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
2: rfp-4b3639a1-8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.0.5.90/32 brd 10.0.5.90 scope global rfp-4b3639a1-8
    inet 10.0.4.190/32 brd 10.0.4.190 scope global rfp-4b3639a1-8
71: qr-a2293a4c-51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1458 qdisc noqueue state UNKNOWN group default
    inet 10.100.0.1/24 brd 10.100.0.255 scope global qr-a2293a4c-51

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 qr-a2293a4c-51
169.254.106.114 0.0.0.0         255.255.255.254 U     0      0        0 rfp-4b3639a1-8

=>
=> Secondary Controller
=>

=> ip netns
snat-4b3639a1-880f-4b55-989f-c6f654e562a7
qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7

=> snat-4b3639a1-880f-4b55-989f-c6f654e562a7
62: qg-1d52c5b9-4b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 10.0.4.253/24 brd 10.0.4.255 scope global qg-1d52c5b9-4b
    inet 10.0.5.253/24 brd 10.0.5.255 scope global qg-1d52c5b9-4b

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.4.254      0.0.0.0         UG    0      0        0 qg-1d52c5b9-4b
10.0.4.0        0.0.0.0         255.255.255.0   U     0      0        0 qg-1d52c5b9-4b
10.0.5.0        0.0.0.0         255.255.255.0   U     0      0        0 qg-1d52c5b9-4b
10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 sg-ed603ce2-fe

=> qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
51: qr-a2293a4c-51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1458 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 10.100.0.1/24 brd 10.100.0.255 scope global qr-a2293a4c-51

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 qr-a2293a4c-51

=> ==========================================
=> The iptables rules in the name spaces

=>
=> Primary Controller
=>

=> fip-cd30c1bb-3db6-488c-b448-6cb4454783be
neutron-fwaas-l3-INPUT        all  --  0.0.0.0/0            0.0.0.0/0
neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-FORWARD      all  --  0.0.0.0/0            0.0.0.0/0
neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-local        all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-scope        all  --  0.0.0.0/0            0.0.0.0/0

neutron-fwaas-l3-PREROUTING   all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0
neutron-postrouting-bottom    all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-float-snat   all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-snat         all  --  0.0.0.0/0            0.0.0.0/0            /* Perform source NAT on outgoing traffic. */

=> qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
neutron-fwaas-l3-INPUT        all  --  0.0.0.0/0            0.0.0.0/0
neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-FORWARD      all  --  0.0.0.0/0            0.0.0.0/0
neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-local        all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-scope        all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT                        all  --  0.0.0.0/0            0.0.0.0/0            mark match 0x1/0xffff
DROP                          tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:9697
DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000

=>
=> Secondary Controller
=>

neutron-fwaas-l3-PREROUTING   all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0
neutron-postrouting-bottom    all  --  0.0.0.0/0            0.0.0.0/0
DNAT                          all  --  0.0.0.0/0            10.0.5.92            to:10.100.0.25
DNAT                          all  --  0.0.0.0/0            10.0.5.90            to:10.100.0.16
DNAT                          all  --  0.0.0.0/0            10.0.5.99            to:10.104.0.44
DNAT                          all  --  0.0.0.0/0            10.0.4.189           to:10.100.0.3
DNAT                          all  --  0.0.0.0/0            10.0.4.195           to:10.104.0.27
DNAT                          all  --  0.0.0.0/0            10.0.4.190           to:10.100.0.4
DNAT                          all  --  0.0.0.0/0            10.0.5.92            to:10.100.0.25
DNAT                          all  --  0.0.0.0/0            10.0.5.90            to:10.100.0.16
DNAT                          all  --  0.0.0.0/0            10.0.5.99            to:10.104.0.44
DNAT                          all  --  0.0.0.0/0            10.0.4.189           to:10.100.0.3
DNAT                          all  --  0.0.0.0/0            10.0.4.195           to:10.104.0.27
DNAT                          all  --  0.0.0.0/0            10.0.4.190           to:10.100.0.4
REDIRECT                      tcp  --  0.0.0.0/0            169.254.169.254      tcp dpt:80 redir ports 9697
SNAT                          all  --  10.100.0.25          0.0.0.0/0            to:10.0.5.92
SNAT                          all  --  10.100.0.16          0.0.0.0/0            to:10.0.5.90
SNAT                          all  --  10.104.0.44          0.0.0.0/0            to:10.0.5.99
SNAT                          all  --  10.100.0.3           0.0.0.0/0            to:10.0.4.189
SNAT                          all  --  10.104.0.27          0.0.0.0/0            to:10.0.4.195
SNAT                          all  --  10.100.0.4           0.0.0.0/0            to:10.0.4.190
neutron-fwaas-l3-float-snat   all  --  0.0.0.0/0            0.0.0.0/0
neutron-fwaas-l3-snat         all  --  0.0.0.0/0            0.0.0.0/0            /* Perform source NAT on outgoing traffic. */

Because the LBaaSv1 worked just fine before I distributed the router
(and the vif and snat name spaces where created) and from what I can
see, all interfaces, routes and iptables rules seems just fine, I can
only deduce that there's something wrong with some of this and I'm
guessing it's with the iptables rules somehow.

But because I don't know how they're (the vif and snat name spaces are
supposed to work, I'm unsure on how to proceed from here.

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: distributed name router snat spaces vif

** Description changed:

  I wish I could come up with a smarter, more descriptive title for this,
  but if someone can after reading my report, feel free to update it.
  
  I installed my second controller the other day (because of resource
  constraints, I run ALL my Openstack control services - APIs, Engines,
  Servers etc, etc - _everything_ but 'nova-compute' and 'nova-console' -
  on one physical host) and then one of my LBaaSv1 (haven't gotten around
  to try enabling v2 again, last time I got some issues which was reported
  elsewhere in the tracker) stopped working.
  
  After almost a day trying to figure out why only one and how to fix it,
  I realized it must be the _router_ not the load balancer that's at fault
  (see below).
  
  Broken LBaaSv1 VIP:          10.100.0.16/24
  Broken LBaaSv1 Floating IP:  10.0.5.90/24
  Working LBaaSv1 Floating IP: 10.0.4.190/24
  Router VIF namespace:        10.0.5.100                (not sure exactly what this is, but for some reason it have 'stolen' the "GW functionality" (incoming) on the router from the .253 interfaces)
  Router qrouter namespace:    10.0.4.253 + 10.0.5.253   (these are on the 'External Gateway' on the router and is supposed to be the routers GW)
  Primary GW/FW/NAT:           eth1:192.168.69.1/24, eth2:10.0.4.254/24, eth2:10.0.5.254/24
  
  => ==========================================
  => From a physical host outside the OS network(s) (i.e. from the 192.168.69.0/24 network):
  
  traceroute to 10.100.0.16 (10.100.0.16), 30 hops max, 60 byte packets   <= CORRECT
-  1  192.168.69.1  0.088 ms  0.077 ms  0.064 ms
-  2  10.0.4.253  0.262 ms  0.246 ms  0.258 ms
-  3  10.100.0.16  2.365 ms  2.348 ms  2.310 ms
+  1  192.168.69.1  0.088 ms  0.077 ms  0.064 ms
+  2  10.0.4.253  0.262 ms  0.246 ms  0.258 ms
+  3  10.100.0.16  2.365 ms  2.348 ms  2.310 ms
  
  traceroute to 10.0.5.90 (10.0.5.90), 30 hops max, 60 byte packets       <= WRONG, LBaaSv1 don't work
-  1  192.168.69.1  0.156 ms  0.138 ms  0.123 ms
-  2  10.0.5.100  0.834 ms  0.863 ms  0.851 ms
-  3  * * *
-  4  10.0.5.90  1.487 ms  1.564 ms  1.561 ms
+  1  192.168.69.1  0.156 ms  0.138 ms  0.123 ms
+  2  10.0.5.100  0.834 ms  0.863 ms  0.851 ms
+  3  * * *
+  4  10.0.5.90  1.487 ms  1.564 ms  1.561 ms
  
  traceroute to 10.0.4.190 (10.0.4.190), 30 hops max, 60 byte packets     <= WRONG, but LBaaSv1 work
-  1  192.168.69.1  0.130 ms  0.112 ms  0.097 ms
-  2  10.0.5.100  1.595 ms  1.581 ms  1.568 ms
-  3  * * *
-  4  10.0.4.190  2.265 ms  2.262 ms  2.251 ms
+  1  192.168.69.1  0.130 ms  0.112 ms  0.097 ms
+  2  10.0.5.100  1.595 ms  1.581 ms  1.568 ms
+  3  * * *
+  4  10.0.4.190  2.265 ms  2.262 ms  2.251 ms
  
  => ==========================================
  => From an instance (inside the 10.100.0.0/24 subnet - all ICMP open)
  
  traceroute to 10.100.0.16 (10.100.0.16), 30 hops max, 60 byte packets
-  1  * * *
-  2  * * *
-  3  *^C
+  1  * * *
+  2  * * *
+  3  *^C
  
  PING 10.100.0.16 (10.100.0.16) 56(84) bytes of data.
  64 bytes from 10.100.0.16: icmp_seq=1 ttl=64 time=1.32 ms
  64 bytes from 10.100.0.16: icmp_seq=2 ttl=64 time=0.548 ms
  64 bytes from 10.100.0.16: icmp_seq=3 ttl=64 time=0.589 ms
  ^C
  
  PING 10.0.5.90 (10.0.5.90) 56(84) bytes of data.
  64 bytes from 10.100.0.16: icmp_seq=1 ttl=64 time=1.02 ms
  64 bytes from 10.0.5.90: icmp_seq=1 ttl=60 time=1.68 ms (DUP!)
  ^C
  
  PING 10.0.4.190 (10.0.4.190) 56(84) bytes of data.
  64 bytes from 10.100.0.4: icmp_seq=1 ttl=64 time=0.925 ms
  64 bytes from 10.0.4.190: icmp_seq=1 ttl=60 time=467 ms (DUP!)
  ^C
  
  => ==========================================
  => The 'actual' problem
  
  => From a host on the 192.168.69.0/24 network
  $ curl --insecure https://10.100.0.16:8140/
  curl: (35) Unknown SSL protocol error in connection to 10.100.0.16:8140 <= FAIL, never reaches backend server
  $ curl --insecure https://10.0.5.90:8140/
  The environment must be purely alphanumeric, not ''                     <= Actually working
  
  => From an instance
  $ curl --insecure https://10.100.0.16:8140/
  The environment must be purely alphanumeric, not ''                     <= Actually working
  $ curl --insecure https://10.0.5.90:8140/
  curl: (35) Unknown SSL protocol error in connection to 10.0.5.90:8140   <= FAIL, never reaches backend server
  
  Testing a connection to 10.0.4.190 with curl won't work - it's "ldaps"
  on port 636. But doing a ldapsearch from 192.168.69.0/24 to that works,
  but not from an instance. So that is broken as well, even though I
  labeled it 'working' above :(. Just "broken" in a different way..
  
  => ==========================================
  => Relevant name spaces on the controllers:
  
  =>
  => Primary Controller
  =>
  
  => ip netns | sort
  fip-cd30c1bb-3db6-488c-b448-6cb4454783be
  qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
  
  => fip-cd30c1bb-3db6-488c-b448-6cb4454783be
  66: fg-38e452be-d4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
-     inet 10.0.5.100/24 brd 10.0.5.255 scope global fg-38e452be-d4
+     inet 10.0.5.100/24 brd 10.0.5.255 scope global fg-38e452be-d4
  
  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  0.0.0.0         10.0.5.254      0.0.0.0         UG    0      0        0 fg-38e452be-d4
  10.0.4.189      169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.4.190      169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.4.195      169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.5.0        0.0.0.0         255.255.255.0   U     0      0        0 fg-38e452be-d4
  10.0.5.90       169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.5.92       169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.5.99       169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  169.254.106.114 0.0.0.0         255.255.255.254 U     0      0        0 fpr-4b3639a1-8
  
  => qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
  2: rfp-4b3639a1-8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
-     inet 10.0.5.90/32 brd 10.0.5.90 scope global rfp-4b3639a1-8
-     inet 10.0.4.190/32 brd 10.0.4.190 scope global rfp-4b3639a1-8
+     inet 10.0.5.90/32 brd 10.0.5.90 scope global rfp-4b3639a1-8
+     inet 10.0.4.190/32 brd 10.0.4.190 scope global rfp-4b3639a1-8
  71: qr-a2293a4c-51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1458 qdisc noqueue state UNKNOWN group default
-     inet 10.100.0.1/24 brd 10.100.0.255 scope global qr-a2293a4c-51
+     inet 10.100.0.1/24 brd 10.100.0.255 scope global qr-a2293a4c-51
  
  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 qr-a2293a4c-51
  169.254.106.114 0.0.0.0         255.255.255.254 U     0      0        0 rfp-4b3639a1-8
  
  =>
  => Secondary Controller
  =>
  
  => ip netns
  snat-4b3639a1-880f-4b55-989f-c6f654e562a7
  qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
  
  => snat-4b3639a1-880f-4b55-989f-c6f654e562a7
  62: qg-1d52c5b9-4b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
-     inet 10.0.4.253/24 brd 10.0.4.255 scope global qg-1d52c5b9-4b
-     inet 10.0.5.253/24 brd 10.0.5.255 scope global qg-1d52c5b9-4b
+     inet 10.0.4.253/24 brd 10.0.4.255 scope global qg-1d52c5b9-4b
+     inet 10.0.5.253/24 brd 10.0.5.255 scope global qg-1d52c5b9-4b
  
  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  0.0.0.0         10.0.4.254      0.0.0.0         UG    0      0        0 qg-1d52c5b9-4b
  10.0.4.0        0.0.0.0         255.255.255.0   U     0      0        0 qg-1d52c5b9-4b
  10.0.5.0        0.0.0.0         255.255.255.0   U     0      0        0 qg-1d52c5b9-4b
  10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 sg-ed603ce2-fe
  
  => qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
  51: qr-a2293a4c-51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1458 qdisc noqueue state UNKNOWN group default qlen 1000
-     inet 10.100.0.1/24 brd 10.100.0.255 scope global qr-a2293a4c-51
+     inet 10.100.0.1/24 brd 10.100.0.255 scope global qr-a2293a4c-51
  
  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 qr-a2293a4c-51
  
  => ==========================================
  => The iptables rules in the name spaces
  
  =>
  => Primary Controller
  =>
  
  => fip-cd30c1bb-3db6-488c-b448-6cb4454783be
  neutron-fwaas-l3-INPUT        all  --  0.0.0.0/0            0.0.0.0/0
  neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-FORWARD      all  --  0.0.0.0/0            0.0.0.0/0
  neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-local        all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-scope        all  --  0.0.0.0/0            0.0.0.0/0
  
  neutron-fwaas-l3-PREROUTING   all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0
  neutron-postrouting-bottom    all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-float-snat   all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-snat         all  --  0.0.0.0/0            0.0.0.0/0            /* Perform source NAT on outgoing traffic. */
  
  => qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
  neutron-fwaas-l3-INPUT        all  --  0.0.0.0/0            0.0.0.0/0
  neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-FORWARD      all  --  0.0.0.0/0            0.0.0.0/0
  neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-local        all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-scope        all  --  0.0.0.0/0            0.0.0.0/0
  ACCEPT                        all  --  0.0.0.0/0            0.0.0.0/0            mark match 0x1/0xffff
  DROP                          tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:9697
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  
  =>
  => Secondary Controller
  =>
  
  neutron-fwaas-l3-PREROUTING   all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0
  neutron-postrouting-bottom    all  --  0.0.0.0/0            0.0.0.0/0
  DNAT                          all  --  0.0.0.0/0            10.0.5.92            to:10.100.0.25
  DNAT                          all  --  0.0.0.0/0            10.0.5.90            to:10.100.0.16
  DNAT                          all  --  0.0.0.0/0            10.0.5.99            to:10.104.0.44
  DNAT                          all  --  0.0.0.0/0            10.0.4.189           to:10.100.0.3
  DNAT                          all  --  0.0.0.0/0            10.0.4.195           to:10.104.0.27
  DNAT                          all  --  0.0.0.0/0            10.0.4.190           to:10.100.0.4
  DNAT                          all  --  0.0.0.0/0            10.0.5.92            to:10.100.0.25
  DNAT                          all  --  0.0.0.0/0            10.0.5.90            to:10.100.0.16
  DNAT                          all  --  0.0.0.0/0            10.0.5.99            to:10.104.0.44
  DNAT                          all  --  0.0.0.0/0            10.0.4.189           to:10.100.0.3
  DNAT                          all  --  0.0.0.0/0            10.0.4.195           to:10.104.0.27
  DNAT                          all  --  0.0.0.0/0            10.0.4.190           to:10.100.0.4
  REDIRECT                      tcp  --  0.0.0.0/0            169.254.169.254      tcp dpt:80 redir ports 9697
  SNAT                          all  --  10.100.0.25          0.0.0.0/0            to:10.0.5.92
  SNAT                          all  --  10.100.0.16          0.0.0.0/0            to:10.0.5.90
  SNAT                          all  --  10.104.0.44          0.0.0.0/0            to:10.0.5.99
  SNAT                          all  --  10.100.0.3           0.0.0.0/0            to:10.0.4.189
  SNAT                          all  --  10.104.0.27          0.0.0.0/0            to:10.0.4.195
  SNAT                          all  --  10.100.0.4           0.0.0.0/0            to:10.0.4.190
  neutron-fwaas-l3-float-snat   all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-snat         all  --  0.0.0.0/0            0.0.0.0/0            /* Perform source NAT on outgoing traffic. */
  
- 
- Because the LBaaSv1 worked just fine before I distributed the router (and the vif and snat name spaces where created and from what I can see, all interfaces, routes and iptables rules seems just fine, I can only deduce that there's something wrong with some of this and I'm guessing it's with the iptables rules somehow. But because I don't know how they're (the vif and snat name spaces are supposed to work, I'm unsure on how to proceed from here.
+ Because the LBaaSv1 worked just fine before I distributed the router
+ (and the vif and snat name spaces where created) and from what I can
+ see, all interfaces, routes and iptables rules seems just fine, I can
+ only deduce that there's something wrong with some of this and I'm
+ guessing it's with the iptables rules somehow.
+ 
+ But because I don't know how they're (the vif and snat name spaces are
+ supposed to work, I'm unsure on how to proceed from here.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1629539

Title:
  Broken distributed virtual router

Status in neutron:
  New

Bug description:
  I wish I could come up with a smarter, more descriptive title for
  this, but if someone can after reading my report, feel free to update
  it.

  I installed my second controller the other day (because of resource
  constraints, I run ALL my Openstack control services - APIs, Engines,
  Servers etc, etc - _everything_ but 'nova-compute' and 'nova-console'
  - on one physical host) and then one of my LBaaSv1 (haven't gotten
  around to try enabling v2 again, last time I got some issues which was
  reported elsewhere in the tracker) stopped working.

  After almost a day trying to figure out why only one and how to fix
  it, I realized it must be the _router_ not the load balancer that's at
  fault (see below).

  Broken LBaaSv1 VIP:          10.100.0.16/24
  Broken LBaaSv1 Floating IP:  10.0.5.90/24
  Working LBaaSv1 Floating IP: 10.0.4.190/24
  Router VIF namespace:        10.0.5.100                (not sure exactly what this is, but for some reason it have 'stolen' the "GW functionality" (incoming) on the router from the .253 interfaces)
  Router qrouter namespace:    10.0.4.253 + 10.0.5.253   (these are on the 'External Gateway' on the router and is supposed to be the routers GW)
  Primary GW/FW/NAT:           eth1:192.168.69.1/24, eth2:10.0.4.254/24, eth2:10.0.5.254/24

  => ==========================================
  => From a physical host outside the OS network(s) (i.e. from the 192.168.69.0/24 network):

  traceroute to 10.100.0.16 (10.100.0.16), 30 hops max, 60 byte packets   <= CORRECT
   1  192.168.69.1  0.088 ms  0.077 ms  0.064 ms
   2  10.0.4.253  0.262 ms  0.246 ms  0.258 ms
   3  10.100.0.16  2.365 ms  2.348 ms  2.310 ms

  traceroute to 10.0.5.90 (10.0.5.90), 30 hops max, 60 byte packets       <= WRONG, LBaaSv1 don't work
   1  192.168.69.1  0.156 ms  0.138 ms  0.123 ms
   2  10.0.5.100  0.834 ms  0.863 ms  0.851 ms
   3  * * *
   4  10.0.5.90  1.487 ms  1.564 ms  1.561 ms

  traceroute to 10.0.4.190 (10.0.4.190), 30 hops max, 60 byte packets     <= WRONG, but LBaaSv1 work
   1  192.168.69.1  0.130 ms  0.112 ms  0.097 ms
   2  10.0.5.100  1.595 ms  1.581 ms  1.568 ms
   3  * * *
   4  10.0.4.190  2.265 ms  2.262 ms  2.251 ms

  => ==========================================
  => From an instance (inside the 10.100.0.0/24 subnet - all ICMP open)

  traceroute to 10.100.0.16 (10.100.0.16), 30 hops max, 60 byte packets
   1  * * *
   2  * * *
   3  *^C

  PING 10.100.0.16 (10.100.0.16) 56(84) bytes of data.
  64 bytes from 10.100.0.16: icmp_seq=1 ttl=64 time=1.32 ms
  64 bytes from 10.100.0.16: icmp_seq=2 ttl=64 time=0.548 ms
  64 bytes from 10.100.0.16: icmp_seq=3 ttl=64 time=0.589 ms
  ^C

  PING 10.0.5.90 (10.0.5.90) 56(84) bytes of data.
  64 bytes from 10.100.0.16: icmp_seq=1 ttl=64 time=1.02 ms
  64 bytes from 10.0.5.90: icmp_seq=1 ttl=60 time=1.68 ms (DUP!)
  ^C

  PING 10.0.4.190 (10.0.4.190) 56(84) bytes of data.
  64 bytes from 10.100.0.4: icmp_seq=1 ttl=64 time=0.925 ms
  64 bytes from 10.0.4.190: icmp_seq=1 ttl=60 time=467 ms (DUP!)
  ^C

  => ==========================================
  => The 'actual' problem

  => From a host on the 192.168.69.0/24 network
  $ curl --insecure https://10.100.0.16:8140/
  curl: (35) Unknown SSL protocol error in connection to 10.100.0.16:8140 <= FAIL, never reaches backend server
  $ curl --insecure https://10.0.5.90:8140/
  The environment must be purely alphanumeric, not ''                     <= Actually working

  => From an instance
  $ curl --insecure https://10.100.0.16:8140/
  The environment must be purely alphanumeric, not ''                     <= Actually working
  $ curl --insecure https://10.0.5.90:8140/
  curl: (35) Unknown SSL protocol error in connection to 10.0.5.90:8140   <= FAIL, never reaches backend server

  Testing a connection to 10.0.4.190 with curl won't work - it's "ldaps"
  on port 636. But doing a ldapsearch from 192.168.69.0/24 to that
  works, but not from an instance. So that is broken as well, even
  though I labeled it 'working' above :(. Just "broken" in a different
  way..

  => ==========================================
  => Relevant name spaces on the controllers:

  =>
  => Primary Controller
  =>

  => ip netns | sort
  fip-cd30c1bb-3db6-488c-b448-6cb4454783be
  qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7

  => fip-cd30c1bb-3db6-488c-b448-6cb4454783be
  66: fg-38e452be-d4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
      inet 10.0.5.100/24 brd 10.0.5.255 scope global fg-38e452be-d4

  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  0.0.0.0         10.0.5.254      0.0.0.0         UG    0      0        0 fg-38e452be-d4
  10.0.4.189      169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.4.190      169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.4.195      169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.5.0        0.0.0.0         255.255.255.0   U     0      0        0 fg-38e452be-d4
  10.0.5.90       169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.5.92       169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.5.99       169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  169.254.106.114 0.0.0.0         255.255.255.254 U     0      0        0 fpr-4b3639a1-8

  => qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
  2: rfp-4b3639a1-8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
      inet 10.0.5.90/32 brd 10.0.5.90 scope global rfp-4b3639a1-8
      inet 10.0.4.190/32 brd 10.0.4.190 scope global rfp-4b3639a1-8
  71: qr-a2293a4c-51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1458 qdisc noqueue state UNKNOWN group default
      inet 10.100.0.1/24 brd 10.100.0.255 scope global qr-a2293a4c-51

  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 qr-a2293a4c-51
  169.254.106.114 0.0.0.0         255.255.255.254 U     0      0        0 rfp-4b3639a1-8

  =>
  => Secondary Controller
  =>

  => ip netns
  snat-4b3639a1-880f-4b55-989f-c6f654e562a7
  qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7

  => snat-4b3639a1-880f-4b55-989f-c6f654e562a7
  62: qg-1d52c5b9-4b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      inet 10.0.4.253/24 brd 10.0.4.255 scope global qg-1d52c5b9-4b
      inet 10.0.5.253/24 brd 10.0.5.255 scope global qg-1d52c5b9-4b

  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  0.0.0.0         10.0.4.254      0.0.0.0         UG    0      0        0 qg-1d52c5b9-4b
  10.0.4.0        0.0.0.0         255.255.255.0   U     0      0        0 qg-1d52c5b9-4b
  10.0.5.0        0.0.0.0         255.255.255.0   U     0      0        0 qg-1d52c5b9-4b
  10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 sg-ed603ce2-fe

  => qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
  51: qr-a2293a4c-51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1458 qdisc noqueue state UNKNOWN group default qlen 1000
      inet 10.100.0.1/24 brd 10.100.0.255 scope global qr-a2293a4c-51

  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 qr-a2293a4c-51

  => ==========================================
  => The iptables rules in the name spaces

  =>
  => Primary Controller
  =>

  => fip-cd30c1bb-3db6-488c-b448-6cb4454783be
  neutron-fwaas-l3-INPUT        all  --  0.0.0.0/0            0.0.0.0/0
  neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-FORWARD      all  --  0.0.0.0/0            0.0.0.0/0
  neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-local        all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-scope        all  --  0.0.0.0/0            0.0.0.0/0

  neutron-fwaas-l3-PREROUTING   all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0
  neutron-postrouting-bottom    all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-float-snat   all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-snat         all  --  0.0.0.0/0            0.0.0.0/0            /* Perform source NAT on outgoing traffic. */

  => qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
  neutron-fwaas-l3-INPUT        all  --  0.0.0.0/0            0.0.0.0/0
  neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-FORWARD      all  --  0.0.0.0/0            0.0.0.0/0
  neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-local        all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-scope        all  --  0.0.0.0/0            0.0.0.0/0
  ACCEPT                        all  --  0.0.0.0/0            0.0.0.0/0            mark match 0x1/0xffff
  DROP                          tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:9697
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000

  =>
  => Secondary Controller
  =>

  neutron-fwaas-l3-PREROUTING   all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0
  neutron-postrouting-bottom    all  --  0.0.0.0/0            0.0.0.0/0
  DNAT                          all  --  0.0.0.0/0            10.0.5.92            to:10.100.0.25
  DNAT                          all  --  0.0.0.0/0            10.0.5.90            to:10.100.0.16
  DNAT                          all  --  0.0.0.0/0            10.0.5.99            to:10.104.0.44
  DNAT                          all  --  0.0.0.0/0            10.0.4.189           to:10.100.0.3
  DNAT                          all  --  0.0.0.0/0            10.0.4.195           to:10.104.0.27
  DNAT                          all  --  0.0.0.0/0            10.0.4.190           to:10.100.0.4
  DNAT                          all  --  0.0.0.0/0            10.0.5.92            to:10.100.0.25
  DNAT                          all  --  0.0.0.0/0            10.0.5.90            to:10.100.0.16
  DNAT                          all  --  0.0.0.0/0            10.0.5.99            to:10.104.0.44
  DNAT                          all  --  0.0.0.0/0            10.0.4.189           to:10.100.0.3
  DNAT                          all  --  0.0.0.0/0            10.0.4.195           to:10.104.0.27
  DNAT                          all  --  0.0.0.0/0            10.0.4.190           to:10.100.0.4
  REDIRECT                      tcp  --  0.0.0.0/0            169.254.169.254      tcp dpt:80 redir ports 9697
  SNAT                          all  --  10.100.0.25          0.0.0.0/0            to:10.0.5.92
  SNAT                          all  --  10.100.0.16          0.0.0.0/0            to:10.0.5.90
  SNAT                          all  --  10.104.0.44          0.0.0.0/0            to:10.0.5.99
  SNAT                          all  --  10.100.0.3           0.0.0.0/0            to:10.0.4.189
  SNAT                          all  --  10.104.0.27          0.0.0.0/0            to:10.0.4.195
  SNAT                          all  --  10.100.0.4           0.0.0.0/0            to:10.0.4.190
  neutron-fwaas-l3-float-snat   all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-snat         all  --  0.0.0.0/0            0.0.0.0/0            /* Perform source NAT on outgoing traffic. */

  Because the LBaaSv1 worked just fine before I distributed the router
  (and the vif and snat name spaces where created) and from what I can
  see, all interfaces, routes and iptables rules seems just fine, I can
  only deduce that there's something wrong with some of this and I'm
  guessing it's with the iptables rules somehow.

  But because I don't know how they're (the vif and snat name spaces are
  supposed to work, I'm unsure on how to proceed from here.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1629539/+subscriptions


Follow ups