← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1743579] Re: Concurrent report_state from multiple agents: segment_host_mapping fails - StaleDataError

 

Reviewed:  https://review.openstack.org/534449
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f84781f246004651e0636f8b6507ee1e48bac6b0
Submitter: Zuul
Branch:    master

commit f84781f246004651e0636f8b6507ee1e48bac6b0
Author: Harald Jensas <hjensas@xxxxxxxxxx>
Date:   Tue Jan 16 21:15:22 2018 +0100

    Add retry decorator update_segment_host_mapping()
    
    When multiple agents register at the same time there is
    a possible race condition causing segment host mappings
    updates to fail. StaleDataError raised by SQLAlchemy ORM.
    
    Adding retry_if_session_inactive() decorator to the method
    fixes the issue.
    
    Also serialize the method with lockutils. It takes 25+
    seconds to update segment host mappings for 10 agents with
    the retry decorator alone. With the method serialized the
    same operation completes in less than 1 second. The retry
    decorator is still required for active/active scenarios.
    
    Closes-Bug: #1743579
    Change-Id: I616457f094d000a4016c610b454be8269d9b4948


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1743579

Title:
  Concurrent report_state from multiple agents:  segment_host_mapping
  fails - StaleDataError

Status in neutron:
  Fix Released

Bug description:
  When multiple host agents rapidly report_state for the first time we
  get StaleDataError and _update_segment_host_mapping_for_agent does not
  complete for all hosts.

  Attached is a file with logs as well as reproducer script and
  instruction on how to set up devstack environment similar to the one I
  am using.

  To Reproduce:
  -------------

  Run script with the delay, time.sleep(10), commented.
   Results:
    * 2x StaleDataError 
    * Only 1 attempt to add host to placement/host-aggregate.

  MariaDB [neutron]> MariaDB [neutron]> SELECT * FROM segmenthostmappings;
  +--------------------------------------+---------------------------------+
  | segment_id                           | host                            |
  +--------------------------------------+---------------------------------+
  | a974ae4c-1389-4e41-9ab9-820165c26acd | host2                           |
  | a974ae4c-1389-4e41-9ab9-820165c26acd | routed-devstack.lab.example.com |
  | bc626d3d-5503-4875-9db8-e1bcfad35979 | host2                           |
  | bc626d3d-5503-4875-9db8-e1bcfad35979 | routed-devstack.lab.example.com |
  | ec7717dd-8533-464f-a3c8-4ecc7dc08d10 | host2                           |
  | ec7717dd-8533-464f-a3c8-4ecc7dc08d10 | routed-devstack.lab.example.com |
  +--------------------------------------+---------------------------------+

  
  Conclutions: 
    * 2x StaleDataError
    * 1x successfull _update_segment_host_mapping after_create.

  *** We should see 3x attempts to add to placement/host-aggregate, one
  for each host agent.  ****

  
  Running the reproducer script with the delay uncommented (No issue):
  --------------------------------------------------------------------

  Run script with the delay, time.sleep(10), enabled.
  Results:
    * No StaleDataError
    * 3 attempts to add the host to placemenb/host-aggregate.

  MariaDB [neutron]> SELECT * FROM segmenthostmappings;
  +--------------------------------------+---------------------------------+
  | segment_id                           | host                            |
  +--------------------------------------+---------------------------------+
  | 11b9258f-8712-43b7-8f39-3eab627a8c7f | host0                           |
  | 11b9258f-8712-43b7-8f39-3eab627a8c7f | host1                           |
  | 11b9258f-8712-43b7-8f39-3eab627a8c7f | host2                           |
  | 11b9258f-8712-43b7-8f39-3eab627a8c7f | routed-devstack.lab.example.com |
  | 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | host0                           |
  | 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | host1                           |
  | 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | host2                           |
  | 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | routed-devstack.lab.example.com |
  | a7a7d2f4-c809-4ebb-916f-930c97fbec47 | host0                           |
  | a7a7d2f4-c809-4ebb-916f-930c97fbec47 | host1                           |
  | a7a7d2f4-c809-4ebb-916f-930c97fbec47 | host2                           |
  | a7a7d2f4-c809-4ebb-916f-930c97fbec47 | routed-devstack.lab.example.com |
  +--------------------------------------+---------------------------------+

  
  Conclution:
    * 3x successfull _update_segment_host_mapping after_create.

  
  ** NOTE: **
  The RESP BODY: {"itemNotFound": {"message": "Compute host host1 could not be found.", "code": 404}} errors in the logs is expected, the fake host is not in Nova, so this is expeced.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1743579/+subscriptions


References