← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1629539] Re: Broken distributed virtual router w/ lbaas v1

 

[Expired for neutron because there has been no activity for 60 days.]

** Changed in: neutron
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1629539

Title:
  Broken distributed virtual router w/ lbaas v1

Status in neutron:
  Expired

Bug description:
  I wish I could come up with a smarter, more descriptive title for
  this, but if someone can after reading my report, feel free to update
  it.

  I installed my second controller the other day (because of resource
  constraints, I run ALL my Openstack control services - APIs, Engines,
  Servers etc, etc - _everything_ but 'nova-compute' and 'nova-console'
  - on one physical host) and then one of my LBaaSv1 (haven't gotten
  around to try enabling v2 again, last time I got some issues which was
  reported elsewhere in the tracker) stopped working.

  After almost a day trying to figure out why only one and how to fix
  it, I realized it must be the _router_ not the load balancer that's at
  fault (see below).

  Broken LBaaSv1 VIP:          10.100.0.16/24
  Broken LBaaSv1 Floating IP:  10.0.5.90/24
  Working LBaaSv1 Floating IP: 10.0.4.190/24
  Router VIF namespace:        10.0.5.100                (not sure exactly what this is, but for some reason it have 'stolen' the "GW functionality" (incoming) on the router from the .253 interfaces)
  Router qrouter namespace:    10.0.4.253 + 10.0.5.253   (these are on the 'External Gateway' on the router and is supposed to be the routers GW)
  Primary GW/FW/NAT:           eth1:192.168.69.1/24, eth2:10.0.4.254/24, eth2:10.0.5.254/24

  => ==========================================
  => From a physical host outside the OS network(s) (i.e. from the 192.168.69.0/24 network):

  traceroute to 10.100.0.16 (10.100.0.16), 30 hops max, 60 byte packets   <= CORRECT
   1  192.168.69.1  0.088 ms  0.077 ms  0.064 ms
   2  10.0.4.253  0.262 ms  0.246 ms  0.258 ms
   3  10.100.0.16  2.365 ms  2.348 ms  2.310 ms

  traceroute to 10.0.5.90 (10.0.5.90), 30 hops max, 60 byte packets       <= WRONG, LBaaSv1 don't work
   1  192.168.69.1  0.156 ms  0.138 ms  0.123 ms
   2  10.0.5.100  0.834 ms  0.863 ms  0.851 ms
   3  * * *
   4  10.0.5.90  1.487 ms  1.564 ms  1.561 ms

  traceroute to 10.0.4.190 (10.0.4.190), 30 hops max, 60 byte packets     <= WRONG, but LBaaSv1 work
   1  192.168.69.1  0.130 ms  0.112 ms  0.097 ms
   2  10.0.5.100  1.595 ms  1.581 ms  1.568 ms
   3  * * *
   4  10.0.4.190  2.265 ms  2.262 ms  2.251 ms

  => ==========================================
  => From an instance (inside the 10.100.0.0/24 subnet - all ICMP open)

  traceroute to 10.100.0.16 (10.100.0.16), 30 hops max, 60 byte packets
   1  * * *
   2  * * *
   3  *^C

  PING 10.100.0.16 (10.100.0.16) 56(84) bytes of data.
  64 bytes from 10.100.0.16: icmp_seq=1 ttl=64 time=1.32 ms
  64 bytes from 10.100.0.16: icmp_seq=2 ttl=64 time=0.548 ms
  64 bytes from 10.100.0.16: icmp_seq=3 ttl=64 time=0.589 ms
  ^C

  PING 10.0.5.90 (10.0.5.90) 56(84) bytes of data.
  64 bytes from 10.100.0.16: icmp_seq=1 ttl=64 time=1.02 ms
  64 bytes from 10.0.5.90: icmp_seq=1 ttl=60 time=1.68 ms (DUP!)
  ^C

  PING 10.0.4.190 (10.0.4.190) 56(84) bytes of data.
  64 bytes from 10.100.0.4: icmp_seq=1 ttl=64 time=0.925 ms
  64 bytes from 10.0.4.190: icmp_seq=1 ttl=60 time=467 ms (DUP!)
  ^C

  => ==========================================
  => The 'actual' problem

  => From a host on the 192.168.69.0/24 network
  $ curl --insecure https://10.100.0.16:8140/
  curl: (35) Unknown SSL protocol error in connection to 10.100.0.16:8140 <= FAIL, never reaches backend server
  $ curl --insecure https://10.0.5.90:8140/
  The environment must be purely alphanumeric, not ''                     <= Actually working

  => From an instance
  $ curl --insecure https://10.100.0.16:8140/
  The environment must be purely alphanumeric, not ''                     <= Actually working
  $ curl --insecure https://10.0.5.90:8140/
  curl: (35) Unknown SSL protocol error in connection to 10.0.5.90:8140   <= FAIL, never reaches backend server

  Testing a connection to 10.0.4.190 with curl won't work - it's "ldaps"
  on port 636. But doing a ldapsearch from 192.168.69.0/24 to that
  works, but not from an instance. So that is broken as well, even
  though I labeled it 'working' above :(. Just "broken" in a different
  way..

  => ==========================================
  => Relevant name spaces on the controllers:

  =>
  => Primary Controller
  =>

  => ip netns | sort
  fip-cd30c1bb-3db6-488c-b448-6cb4454783be
  qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7

  => fip-cd30c1bb-3db6-488c-b448-6cb4454783be
  66: fg-38e452be-d4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
      inet 10.0.5.100/24 brd 10.0.5.255 scope global fg-38e452be-d4

  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  0.0.0.0         10.0.5.254      0.0.0.0         UG    0      0        0 fg-38e452be-d4
  10.0.4.189      169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.4.190      169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.4.195      169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.5.0        0.0.0.0         255.255.255.0   U     0      0        0 fg-38e452be-d4
  10.0.5.90       169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.5.92       169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  10.0.5.99       169.254.106.114 255.255.255.255 UGH   0      0        0 fpr-4b3639a1-8
  169.254.106.114 0.0.0.0         255.255.255.254 U     0      0        0 fpr-4b3639a1-8

  => qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
  2: rfp-4b3639a1-8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
      inet 10.0.5.90/32 brd 10.0.5.90 scope global rfp-4b3639a1-8
      inet 10.0.4.190/32 brd 10.0.4.190 scope global rfp-4b3639a1-8
  71: qr-a2293a4c-51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1458 qdisc noqueue state UNKNOWN group default
      inet 10.100.0.1/24 brd 10.100.0.255 scope global qr-a2293a4c-51

  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 qr-a2293a4c-51
  169.254.106.114 0.0.0.0         255.255.255.254 U     0      0        0 rfp-4b3639a1-8

  =>
  => Secondary Controller
  =>

  => ip netns
  snat-4b3639a1-880f-4b55-989f-c6f654e562a7
  qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7

  => snat-4b3639a1-880f-4b55-989f-c6f654e562a7
  62: qg-1d52c5b9-4b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      inet 10.0.4.253/24 brd 10.0.4.255 scope global qg-1d52c5b9-4b
      inet 10.0.5.253/24 brd 10.0.5.255 scope global qg-1d52c5b9-4b

  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  0.0.0.0         10.0.4.254      0.0.0.0         UG    0      0        0 qg-1d52c5b9-4b
  10.0.4.0        0.0.0.0         255.255.255.0   U     0      0        0 qg-1d52c5b9-4b
  10.0.5.0        0.0.0.0         255.255.255.0   U     0      0        0 qg-1d52c5b9-4b
  10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 sg-ed603ce2-fe

  => qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
  51: qr-a2293a4c-51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1458 qdisc noqueue state UNKNOWN group default qlen 1000
      inet 10.100.0.1/24 brd 10.100.0.255 scope global qr-a2293a4c-51

  Kernel IP routing table
  Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
  10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 qr-a2293a4c-51

  => ==========================================
  => The iptables rules in the name spaces

  =>
  => Primary Controller
  =>

  => fip-cd30c1bb-3db6-488c-b448-6cb4454783be
  neutron-fwaas-l3-INPUT        all  --  0.0.0.0/0            0.0.0.0/0
  neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-FORWARD      all  --  0.0.0.0/0            0.0.0.0/0
  neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-local        all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-scope        all  --  0.0.0.0/0            0.0.0.0/0

  neutron-fwaas-l3-PREROUTING   all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0
  neutron-postrouting-bottom    all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-float-snat   all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-snat         all  --  0.0.0.0/0            0.0.0.0/0            /* Perform source NAT on outgoing traffic. */

  => qrouter-4b3639a1-880f-4b55-989f-c6f654e562a7
  neutron-fwaas-l3-INPUT        all  --  0.0.0.0/0            0.0.0.0/0
  neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-FORWARD      all  --  0.0.0.0/0            0.0.0.0/0
  neutron-filter-top            all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-local        all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-scope        all  --  0.0.0.0/0            0.0.0.0/0
  ACCEPT                        all  --  0.0.0.0/0            0.0.0.0/0            mark match 0x1/0xffff
  DROP                          tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:9697
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000
  DROP                          all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000000/0xffff0000

  =>
  => Secondary Controller
  =>

  neutron-fwaas-l3-PREROUTING   all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-OUTPUT       all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0
  neutron-postrouting-bottom    all  --  0.0.0.0/0            0.0.0.0/0
  DNAT                          all  --  0.0.0.0/0            10.0.5.92            to:10.100.0.25
  DNAT                          all  --  0.0.0.0/0            10.0.5.90            to:10.100.0.16
  DNAT                          all  --  0.0.0.0/0            10.0.5.99            to:10.104.0.44
  DNAT                          all  --  0.0.0.0/0            10.0.4.189           to:10.100.0.3
  DNAT                          all  --  0.0.0.0/0            10.0.4.195           to:10.104.0.27
  DNAT                          all  --  0.0.0.0/0            10.0.4.190           to:10.100.0.4
  DNAT                          all  --  0.0.0.0/0            10.0.5.92            to:10.100.0.25
  DNAT                          all  --  0.0.0.0/0            10.0.5.90            to:10.100.0.16
  DNAT                          all  --  0.0.0.0/0            10.0.5.99            to:10.104.0.44
  DNAT                          all  --  0.0.0.0/0            10.0.4.189           to:10.100.0.3
  DNAT                          all  --  0.0.0.0/0            10.0.4.195           to:10.104.0.27
  DNAT                          all  --  0.0.0.0/0            10.0.4.190           to:10.100.0.4
  REDIRECT                      tcp  --  0.0.0.0/0            169.254.169.254      tcp dpt:80 redir ports 9697
  SNAT                          all  --  10.100.0.25          0.0.0.0/0            to:10.0.5.92
  SNAT                          all  --  10.100.0.16          0.0.0.0/0            to:10.0.5.90
  SNAT                          all  --  10.104.0.44          0.0.0.0/0            to:10.0.5.99
  SNAT                          all  --  10.100.0.3           0.0.0.0/0            to:10.0.4.189
  SNAT                          all  --  10.104.0.27          0.0.0.0/0            to:10.0.4.195
  SNAT                          all  --  10.100.0.4           0.0.0.0/0            to:10.0.4.190
  neutron-fwaas-l3-float-snat   all  --  0.0.0.0/0            0.0.0.0/0
  neutron-fwaas-l3-snat         all  --  0.0.0.0/0            0.0.0.0/0            /* Perform source NAT on outgoing traffic. */

  Because the LBaaSv1 worked just fine before I distributed the router
  (and the vif and snat name spaces where created) and from what I can
  see, all interfaces, routes and iptables rules seems just fine, I can
  only deduce that there's something wrong with some of this and I'm
  guessing it's with the iptables rules somehow.

  But because I don't know how they're (the vif and snat name spaces are
  supposed to work, I'm unsure on how to proceed from here.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1629539/+subscriptions


References