← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1865891] Re: Race condition during removal of subnet from the router and removal of subnet

 

Reviewed:  https://review.opendev.org/713045
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a8a2bd7e074bfa952ebf0803d006c3e29c8468b4
Submitter: Zuul
Branch:    master

commit a8a2bd7e074bfa952ebf0803d006c3e29c8468b4
Author: Rodolfo Alonso Hernandez <ralonsoh@xxxxxxxxxx>
Date:   Fri Mar 13 18:31:01 2020 +0000

    Lock subnets during port creation and subnet deletion
    
    The field "in_use" is added to "subnet" DB definition. This DB
    register column is a flag used to mark a register as in use
    by other transaction. When a write DB transaction writes any
    value on this field, the register is locked for any other
    concurrent transaction. If two DB transactions try to set this
    column at the same time, one of them will fail.
    
    This DB lock is implemented in "subnet" and is used during the
    subnet deletion and the port IP assignation, where all the port
    network subnets are retrieved to provide an IP address on the subnet
    CIDR.
    
    As reported in the related bug, it was possible to assign an IP
    to a port and, before the port creation command finished, delete the
    subnet where the IP belonged. This patch introduces this subnet lock
    during the IP assignation and at the beginning of the subnet deletion
    process. At the end of both transactions, the DB engine checks if the
    lock operation (write "in_use" column) is possible or the subnet
    register was already requested by other DB transaction.
    
    Change-Id: I45a724917389814e83400f5854ada175dfce2b7b
    Closes-Bug: #1865891


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1865891

Title:
  Race condition during removal of subnet from the router and removal of
  subnet

Status in neutron:
  Fix Released

Bug description:
  Bug originally reported in
  https://bugzilla.redhat.com/show_bug.cgi?id=1806963 but I was also
  able to reproduce it on master branch.

  Original bug description:

  I tried to perform the following actions in background:
   1. Create subnet from pool
   2. Attach subnet to router
   3. Detach subnet from router
   4. Delete subnet
   5. Sleep 2 seconds
   6. GOTO 1

  It failed with one of the following errors in l3-agent.log:

  [-] Error while deleting router 9935b2d9-65af-4d5e-b0d4-7988cd638e66:
  KeyError: 'subnets'
  Traceback (most recent call last):
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py",
  line 385, in _safe_router_removed
      self._router_removed(router_id)
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py",
  line 404, in _router_removed
      ri.delete()
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py",
  line 459, in delete
      super(HaRouter, self).delete()
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py",
  line 421, in delete
      self.process_delete()
    File "/usr/lib/python2.7/site-packages/neutron/common/utils.py",
  line 165, in call
      self.logger(e)
    File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
  220, in __exit__
      self.force_reraise()
    File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
  196, in force_reraise
      six.reraise(self.type_, self.value, self.tb)
    File "/usr/lib/python2.7/site-packages/neutron/common/utils.py",
  line 162, in call
      return func(*args, **kwargs)
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py",
  line 1164, in process_delete
      self._process_internal_ports()
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py",
  line 575, in _process_internal_ports
      for subnet in p['subnets']:
  KeyError: 'subnets'

  
  [-] Failed to process compatible router:
  9935b2d9-65af-4d5e-b0d4-7988cd638e66: KeyError: 'mtu'
  Traceback (most recent call last):
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py",
  line 628, in _process_routers_if_compatible
      self._process_router_if_compatible(router)
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py",
  line 486, in _process_router_if_compatible
      self._process_updated_router(router)
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py",
  line 527, in _process_updated_router
      ri.process()
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py",
  line 474, in process
      super(HaRouter, self).process()
    File "/usr/lib/python2.7/site-packages/neutron/common/utils.py",
  line 165, in call
      self.logger(e)
    File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
  220, in __exit__
      self.force_reraise()
    File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
  196, in force_reraise
      six.reraise(self.type_, self.value, self.tb)
    File "/usr/lib/python2.7/site-packages/neutron/common/utils.py",
  line 162, in call
      return func(*args, **kwargs)
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py",
  line 1181, in process
      self._process_internal_ports()
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py",
  line 567, in _process_internal_ports
      internal_ports)
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py",
  line 515, in _get_updated_ports
      mtu_changed = existing_port['mtu'] != current_port['mtu']
  KeyError: 'mtu'

  
  In l3-agent.log file I can see that there is no information about subnet and IP address:

  (output is formatted)
  2020-02-25 09:59:33.846 875552 DEBUG neutron.agent.l3.router_info [-] appending port 
  {
    u'allowed_address_pairs': [
      
    ],
    u'extra_dhcp_opts': [
      
    ],
    u'updated_at': u'2020-02-25T09:59:33Z',
    u'device_owner': u'network:ha_router_replicated_interface',
    u'revision_number': 11,
    u'port_security_enabled': False,
    u'binding:profile': {
      
    },
    u'fixed_ips': [
      
    ],
    u'id': u'30b654b9-0d09-407d-8553-b84c0d36e5ef',
    u'security_groups': [
      
    ],
    u'binding:vif_details': {
      u'port_filter': True,
      u'datapath_type': u'system',
      u'ovs_hybrid_plug': True
    },
    u'binding:vif_type': u'ovs',
    u'qos_policy_id': None,
    u'mac_address': u'fa:16:3e:6b:13:79',
    u'project_id': u'e364e04c62d845a0ac682782a07712ee',
    u'status': u'DOWN',
    u'binding:host_id': u'controller-0.redhat.local',
    u'description': u'',
    u'tags': [
      
    ],
    u'device_id': u'6b7a42d0-12ba-4e07-aa4b-3e58f11974f6',
    u'name': u'',
    u'admin_state_up': True,
    u'network_id': u'2506f745-6581-4b9a-8dde-8c11ebf1d7cb',
    u'tenant_id': u'e364e04c62d845a0ac682782a07712ee',
    u'created_at': u'2020-02-25T09:59:28Z',
    u'binding:vnic_type': u'normal',
    u'ip_allocation': u'immediate'
  } 
  to internal_ports cache _process_internal_ports /usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py:583


  Based on openvswitch-agent.log it seems that the subnet is deleted
  before the port configuration is compleate:

  2020-02-25 09:59:29.901 107901 DEBUG neutron.agent.resource_cache [req-c5a7718c-c4d6-4dbf-a43b-286f2fb09956 9c16552bff264e21a01ba4b3e8ba0d90 e364e04c62d
  845a0ac682782a07712ee - - -] Received new resource Port: Port(admin_state_up=True,allowed_address_pairs=[],binding=PortBinding,binding_levels=[],created
  _at=2020-02-25T09:59:28Z,data_plane_status=<?>,description='',device_id='6b7a42d0-12ba-4e07-aa4b-3e58f11974f6',device_owner='network:ha_router_replicate
  d_interface',dhcp_options=[],distributed_binding=None,dns=None,fixed_ips=[IPAllocation],id=30b654b9-0d09-407d-8553-b84c0d36e5ef,mac_address=fa:16:3e:6b:
  13:79,name='',network_id=2506f745-6581-4b9a-8dde-8c11ebf1d7cb,project_id='e364e04c62d845a0ac682782a07712ee',qos_policy_id=None,revision_number=5,securit
  y=PortSecurity(30b654b9-0d09-407d-8553-b84c0d36e5ef),security_group_ids=set([]),status='DOWN',updated_at=2020-02-25T09:59:29Z) record_resource_update /u
  sr/lib/python2.7/site-packages/neutron/agent/resource_cache.py:187

  2020-02-25 09:59:30.022 107901 DEBUG neutron.agent.resource_cache [req-ee7510d2-69cc-49a6-bdfa-4455d7df47ee 9c16552bff264e21a01ba4b3e8ba0d90 e364e04c62d
  845a0ac682782a07712ee - - -] Resource Subnet deleted: 5561834a-9bf3-41e7-ac87-d2d0eae65ca7 record_resource_delete /usr/lib/python2.7/site-packages/neutr
  on/agent/resource_cache.py:197

  2020-02-25 09:59:30.436 107901 DEBUG neutron.agent.resource_cache [req-c5a7718c-c4d6-4dbf-a43b-286f2fb09956 9c16552bff264e21a01ba4b3e8ba0d90 e364e04c62d
  845a0ac682782a07712ee - - -] Resource Port 30b654b9-0d09-407d-8553-b84c0d36e5ef updated (revision_number 5->7). Old fields: {'fixed_ips': [IPAllocation(ip_address=10.108.108.1,network_id=2506f745-6581-4b9a-8dde-8c11ebf1d7cb,port_id=30b654b9-0d09-407d-8553-b84c0d36e5ef,subnet_id=5561834a-9bf3-41e7-ac87-d2d0eae65ca7)]} New fields: {'fixed_ips': []} record_resource_update /usr/lib/python2.7/site-packages/neutron/agent/resource_cache.py:185


  Version-Release number of selected component (if applicable):
  OpenStack-13.0-RHEL-7-20200214.1

  
  Steps to Reproduce:

  openstack subnet pool create --pool-prefix 10.108.108.0/24 the_new_subnet_pool
  openstack network create the_new_network_1
  openstack router create the_new_router

  for i in {1..10};
  do
      openstack subnet create --subnet-pool the_new_subnet_pool
  --prefix-length 27 --network the_new_network_1 the_new_subnet_1 &
      openstack router add subnet the_new_router the_new_subnet_1 &
      openstack router remove subnet the_new_router the_new_subnet_1 &
      openstack subnet delete the_new_subnet_1 &
      sleep 2
  done


  The issue causes the following errors:
  1. All the interfaces are removed from router's namespace
  2. Can't assign new subnets/ports to the router
  3. Can't delete router

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1865891/+subscriptions


References