← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1841967] Re: ML2 mech driver sometimes receives network context without provider attributes in delete_network_postcommit

 

Reviewed:  https://review.opendev.org/679483
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fea2d9091f71a2ec88318121ed9a22180e1ae96f
Submitter: Zuul
Branch:    master

commit fea2d9091f71a2ec88318121ed9a22180e1ae96f
Author: Mark Goddard <mark@xxxxxxxxxxxx>
Date:   Fri Aug 30 16:58:34 2019 +0100

    Create _mech_context before delete to avoid race
    
    When a network is deleted, precommit handlers are notified prior to the
    deletion of the network from the database. One handler exists in the ML2
    plugin - _network_delete_precommit_handler. This handler queries the
    database for the current state of the network and uses it to create a
    NetworkContext which it saves under context._mech_context. When the
    postcommit handler _network_delete_after_delete_handler is triggered
    later, it passess the saved context._mech_context to mechanism drivers.
    
    A problem can occur with provider networks since the segments service
    also registers a precommit handler - _delete_segments_for_network. Both
    precommit handlers use the default priority, so the order in which they
    are called is random, and determined by dict ordering. If the segment
    precommit handler executes first, it will delete the segments associated
    with the network. When the ML2 plugin precommit handler runs it then
    sees no segments for the network and sets the provider attributes of the
    network in the NetworkContext to None.
    
    A mechanism driver that is passed a NetworkContext without provider
    attributes in its delete_network_postcommit method will not have the
    information to perform the necessary actions.  In the case of the
    networking-generic-switch mechanism driver where this was observed, this
    resulted in the driver ignoring the event, because the network did not
    look like a VLAN.
    
    This change uses a priority of zero for ML2 network delete precommit
    handler, to ensure they query the network and store the NetworkContext
    before the segments service has a chance to delete segments.
    
    A similar change has been made for subnets, both to keep the pattern
    consistent and avoid any similar issues.
    
    Change-Id: I6482223ed2a479de4f5ef4cef056c311c0281408
    Closes-Bug: #1841967
    Depends-On: https://review.opendev.org/680001


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1841967

Title:
  ML2 mech driver sometimes receives network context without provider
  attributes in delete_network_postcommit

Status in neutron:
  Fix Released

Bug description:
  When a network is deleted, sometimes the delete_network_postcommit
  method of my ML2 mechanism driver receives a network object in the
  context that has the provider attributes set to None.

  I am using Rocky (13.0.4), on CentOS 7.5 + RDO, and kolla-ansible. I
  have three controllers running neutron-server.

  Specifically, the mechanism driver is networking-generic-switch. It
  needs the provider information in order to configure VLANs on physical
  switches, and without it I am left with stale switch configuration.

  In my testing I have found that reducing the number of neutron-server
  instances reduces the likelihood of seeing this issue. I did not see
  it with only one instance running, but only tested ~10 times.

  I have collected logs from a broken case and a working case, and one
  key difference I can see is that in the working case I see two of
  these messages, and in the broken case I see three:

  Network 3ed87da6-0b3a-455a-b813-7d069dc9e112 has no segments
  _extend_network_dict_provider /usr/lib/python2.7/site-
  packages/neutron/plugins/ml2/managers.py:168

  Indeed, _extend_network_dict_provider sets the provider attributes to
  None if there are no segments found in the DB.

  It seems to be a race condition between segment deletion and creation
  of the _mech_context in the network precommit.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1841967/+subscriptions


References