← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1796491] Re: DVR Floating IP setup in the SNAT namespace of the network node and also in the qrouter namespace in the compute node

 

Reviewed:  https://review.openstack.org/609924
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cd0cc47a6ab0f7968a8c24e9d477909c45e4ae87
Submitter: Zuul
Branch:    master

commit cd0cc47a6ab0f7968a8c24e9d477909c45e4ae87
Author: Swaminathan Vasudevan <SVasudevan@xxxxxxxx>
Date:   Thu Oct 11 23:25:44 2018 -0700

    DVR: Centralized FloatingIPs are not cleared after migration.
    
    With DVR routers, if a port is associated with a FloatingIP,
    before it is used by a VM, the FloatingIP will be initially
    started at the Network Node SNAT Namespace, since the port
    is not bound to any host.
    
    Then when the port is attached to a VM, the port gets its
    host binding, and then the FloatingIP setup should be migrated
    to the Compute host and the original FloatingIP in the Network
    Node SNAT Namespace should be cleared.
    
    But the original FloatingIP setup in SNAT Namespace was not
    cleared by the agent.
    
    This patch addresses the issue.
    
    Change-Id: I55a16bcc0020087aa1abe76f5bc85cd64ccdaecd
    Closes-Bug: #1796491


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1796491

Title:
  DVR Floating IP setup in the SNAT namespace of the network node and
  also in the qrouter namespace in the compute node

Status in neutron:
  Fix Released

Bug description:
  * We have the following setup:

  - DVR, without HA disabled
  - 3 controller nodes (network nodes)
  - multiple computes
  - public floating IP addresses

  * What we trying to accomplish:
  - Attach floating IPs 

  * Versions:
  - neutron Pike (11.0.3)
  - also same behavior neutron queens (12.0.3)
  - openstack environment deployed via kolla-ansible

  
  * The problem is:
  when we create an instance with a VXLAN tenant network and attach a floating IP, the floating IP is setup correctly in the qrouter namespace as an iptables DNAT in the compute where the instance is running, however sometimes the floating IP is also setup in the SNAT namespace of the network node, where the centralized SNAT is setup. 

  This causes that access to the instance (ping/ssh) fails because
  traffic request go through the SNAT namespace in the network node
  instead of going through the qrouter namespace in the compute.

  It's kind of a race condition, because a few times even if the FIP was
  setup in both the SNAT namespace and the qrouter namespace, we can
  login to the instance and we can see traffic goes only through the
  fip/qrouter namespace in the compute.

  * What we expect:
  FIPs should not be setup in the SNAT namespace in the network node so we can connect to our instances via ssh or ping.

  * This is an example:
  Floating IP is: X.Y.Z.169

  1. Instance is active with a FIP
  root@service001:/opt/kolla-configs# openstack server list --all | grep X.Y.Z.169
  | 6d789f9f-4fc0-4725-9a26-a35f90ab1d2c | APP_server   | ACTIVE            | asdadasd=10.10.10.7, X.Y.Z.169

  2. We use a distributed router without HA
  root@service001:/opt/kolla-configs# openstack router list  --project 6898232aaee84941ab0de4f259771840
  +--------------------------------------+----------------+--------+-------+-------------+-------+----------------------------------+
  | ID                                   | Name           | Status | State | Distributed | HA    | Project                          |
  +--------------------------------------+----------------+--------+-------+-------------+-------+----------------------------------+
  | ee69bc58-1347-45de-abf6-4667d974fc9d | asdadasdad     | ACTIVE | UP    | True        | False | 6898232aaee84941ab0de4f259771840 |

  3. We can see in the qrouter namespace in the compute that DNAT/SNAT were setup correctly.
  root@compute006:~# ip netns exec qrouter-ee69bc58-1347-45de-abf6-4667d974fc9d iptables -L -t nat -n -v | grep X.Y.Z.169
      0     0 DNAT       all  --  rfp-ee69bc58-1 *       0.0.0.0/0            X.Y.Z.169       to:10.10.10.7
      0     0 SNAT       all  --  *      *       10.10.10.7           0.0.0.0/0            to:X.Y.Z.169

  4. However in the centralized SNAT namespace in the network node, the IP is setup as an IP of the interface: qg-50f7f260-49:
  root@controller001:~# ip netns exec snat-ee69bc58-1347-45de-abf6-4667d974fc9d ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host
         valid_lft forever preferred_lft forever
  2658: sg-c2a20a44-1b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1
      link/ether fa:16:3e:ec:47:4b brd ff:ff:ff:ff:ff:ff
      inet 10.10.10.14/24 brd 10.10.10.255 scope global sg-c2a20a44-1b
         valid_lft forever preferred_lft forever
      inet6 fe80::f816:3eff:feec:474b/64 scope link
         valid_lft forever preferred_lft forever
  2663: qg-50f7f260-49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1
      link/ether fa:16:3e:e6:a3:7b brd ff:ff:ff:ff:ff:ff
      inet X.Y.Z.27/24 brd X.Y.Z.255 scope global qg-50f7f260-49
         valid_lft forever preferred_lft forever
      inet X.Y.Z.32/32 brd X.Y.Z.32 scope global qg-50f7f260-49
         valid_lft forever preferred_lft forever
      inet X.Y.Z.169/32 brd X.Y.Z.169 scope global qg-50f7f260-49
         valid_lft forever preferred_lft forever
      inet6 fe80::f816:3eff:fee6:a37b/64 scope link
         valid_lft forever preferred_lft forever

  5. Afaik, the FIP shouldn't be setup in the SNAT namespace of the
  network node, and I see in the logs is setup in both places:

  neutron-server.log (network node)
  2018-10-06 01:43:38.295 42 DEBUG neutron.db.l3_hamode_db [req-ef937e08-7509-4623-9ac2-012734286462 - - - - -] neutron.services.l3_router.l3_router_plugin.L3RouterPlugin method _process_sync_ha_data called
   with arguments
  ...
  {'router_id': u'ee69bc58-1347-45de-abf6-4667d974fc9d', 'status': u'DOWN', 'description': u'', 'tags': [], 'updated_at': '2018-10-06T01:43:35Z', 'dns_domain': '', 'floating_network_id': u'2f310092-1c75-4cb6-9758-edb13fb96d60', 'host': u'compute006', 'fixed_ip_address': u'10.10.10.7', 'floating_ip_address': u'X.Y.Z.169', 'revision_number': 0, 'port_id': u'95c43db6-9da0-4baa-ab78-020f09ce864d', 'id': u'5882da22-aedc-4ffc-8ea7-36119d4841fd', 'dest_host': None, 'dns_name': '', 'created_at': '2018-10-06T01:43:35Z', 'tenant_id': u'6898232aaee84941ab0de4f259771840', 'fixed_ip_address_scope': None, 'project_id': u'6898232aaee84941ab0de4f259771840'}

  neutron-l3-agent.log (network node)
  2018-10-06 01:43:48.207 14 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-ee69bc58-1347-45de-abf6-4667d974fc9d', 'ip', '-4', 'addr', 'add', 'X.Y.Z.169/32', 'scope', 'global', 'dev', 'qg-50f7f260-49', 'brd', 'X.Y.Z.169'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92
  2018-10-06 01:43:50.555 14 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-ee69bc58-1347-45de-abf6-4667d974fc9d', 'arping', '-U', '-I', 'qg-50f7f260-49', '-c', '1', '-w', '1.5', 'X.Y.Z.169'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92

  neutron-l3-agent.log (compute node)
  2018-10-06 01:43:47.850 14 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'fip-2f310092-1c75-4cb6-9758-edb13fb96d60', 'ip', '-4', 'route', 'replace', 'X.Y.Z.169/32', 'via', '169.254.118.42', 'dev', 'fpr-ee69bc58-1'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92
  2018-10-06 01:43:48.516 14 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'fip-2f310092-1c75-4cb6-9758-edb13fb96d60', 'arping', '-U', '-I', 'fg-331574bc-69', '-c', '1', '-w', '1.5', 'X.Y.Z.169'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:92

  I've noticed this happens specially when we deploy instances and
  attach FIPs via Heat templates. Restarting the L3-agent service
  sometimes helps to reduce this condition, but eventually we encounter
  this issue again.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1796491/+subscriptions


References