← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1742401] Re: Fullstack tests neutron.tests.fullstack.test_securitygroup.TestSecurityGroupsSameNetwork fails often

 

Reviewed:  https://review.openstack.org/536367
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=725df3e0382e048391fac109ea57920683eaf4d0
Submitter: Zuul
Branch:    master

commit 725df3e0382e048391fac109ea57920683eaf4d0
Author: Sławek Kapłoński <slawek@xxxxxxxxxxxx>
Date:   Mon Jan 22 14:01:30 2018 +0100

    Fix race condition with enabling SG on many ports at once
    
    When there are many calls to enable security groups on ports there
    can be sometimes race condition between refresh recource_cache
    with data get by "pull" call to neutron server and data received
    with "push" rpc message from neutron server.
    In such case when "push" message comes with information about
    updated port (with enabled port_security), in local cache this port
    is already updated so local AFTER_UPDATE call is not called for
    such port and its rules in firewall are not updated.
    
    It happend quite often in fullstack security groups test because
    there are 4 ports created in this test and all 4 are updated to
    apply SG to it one by one.
    And here's what happen then in details:
    1. port 1 was updated in neutron-server so it sends push notification
       to L2 agent to update security groups,
    2. port 1 info was saved in resource cache on L2 agent's side and agent
       started to configure security groups for this port,
    3. as one of steps L2 agent called
       SecurityGroupServerAPIShim._select_ips_for_remote_group() method;
       In that method RemoteResourceCache.get_resources() is called and this
       method asks neutron-server for details about ports from given
       security_group,
    4. in the meantime neutron-server got port update call for second port
       (with same security group) so it sends to L2 agent informations about 2
       ports (as a reply to request sent from L2 agent in step 3),
    5. resource cache updates informations about two ports in local cache,
       returns its data to
       SecurityGroupServerAPIShim._select_ips_for_remote_group() and all
       looks fine,
    6. but now L2 agent receives push notification with info that port 2 is
       updated (changed security groups), so it checks info about this port
       in local cache,
    7. in local cache info about port 2 is already WITH updated security
       group so RemoteResourceCache doesn't trigger local notification about
       port AFTER UPDATE and L2 agent doesn't know that security groups for this
       port should be changed
    
    This patch fixes it by changing way how items are updated in
    the resource_cache.
    For now it is done with record_resource_update() method instead of
    writing new values directly to resource_cache._type_cache dict.
    Due to that if resource will be updated during "pull" call to neutron
    server, local AFTER_UPDATE will still be triggered for such resource.
    
    Change-Id: I5a62cc5731c5ba571506a3aa26303a1b0290d37b
    Closes-Bug: #1742401


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1742401

Title:
  Fullstack tests
  neutron.tests.fullstack.test_securitygroup.TestSecurityGroupsSameNetwork
  fails often

Status in neutron:
  Fix Released

Bug description:
  Fullstack tests from group
  neutron.tests.fullstack.test_securitygroup.TestSecurityGroupsSameNetwork
  are often failing in gate with error like:

  ft1.1: neutron.tests.fullstack.test_securitygroup.TestSecurityGroupsSameNetwork.test_securitygroup(ovs-hybrid)_StringException: Traceback (most recent call last):
    File "neutron/tests/base.py", line 132, in func
      return f(self, *args, **kwargs)
    File "neutron/tests/fullstack/test_securitygroup.py", line 193, in test_securitygroup
      net_helpers.assert_no_ping(vms[0].namespace, vms[1].ip)
    File "neutron/tests/common/net_helpers.py", line 155, in assert_no_ping
      {'ns': src_namespace, 'destination': dst_ip})
    File "neutron/tests/tools.py", line 144, in fail
      raise unittest2.TestCase.failureException(msg)
  AssertionError: destination ip 20.0.0.9 is replying to ping from namespace test-dbbb4045-363f-44cb-825b-17090f28df11, but it shouldn't

  Example gate logs: http://logs.openstack.org/43/529143/3/check
  /neutron-fullstack/d031a6b/logs/testr_results.html.gz

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1742401/+subscriptions


References