yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #70455
[Bug 1743579] Re: Concurrent report_state from multiple agents: segment_host_mapping fails - StaleDataError
Reviewed: https://review.openstack.org/534449
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f84781f246004651e0636f8b6507ee1e48bac6b0
Submitter: Zuul
Branch: master
commit f84781f246004651e0636f8b6507ee1e48bac6b0
Author: Harald Jensas <hjensas@xxxxxxxxxx>
Date: Tue Jan 16 21:15:22 2018 +0100
Add retry decorator update_segment_host_mapping()
When multiple agents register at the same time there is
a possible race condition causing segment host mappings
updates to fail. StaleDataError raised by SQLAlchemy ORM.
Adding retry_if_session_inactive() decorator to the method
fixes the issue.
Also serialize the method with lockutils. It takes 25+
seconds to update segment host mappings for 10 agents with
the retry decorator alone. With the method serialized the
same operation completes in less than 1 second. The retry
decorator is still required for active/active scenarios.
Closes-Bug: #1743579
Change-Id: I616457f094d000a4016c610b454be8269d9b4948
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1743579
Title:
Concurrent report_state from multiple agents: segment_host_mapping
fails - StaleDataError
Status in neutron:
Fix Released
Bug description:
When multiple host agents rapidly report_state for the first time we
get StaleDataError and _update_segment_host_mapping_for_agent does not
complete for all hosts.
Attached is a file with logs as well as reproducer script and
instruction on how to set up devstack environment similar to the one I
am using.
To Reproduce:
-------------
Run script with the delay, time.sleep(10), commented.
Results:
* 2x StaleDataError
* Only 1 attempt to add host to placement/host-aggregate.
MariaDB [neutron]> MariaDB [neutron]> SELECT * FROM segmenthostmappings;
+--------------------------------------+---------------------------------+
| segment_id | host |
+--------------------------------------+---------------------------------+
| a974ae4c-1389-4e41-9ab9-820165c26acd | host2 |
| a974ae4c-1389-4e41-9ab9-820165c26acd | routed-devstack.lab.example.com |
| bc626d3d-5503-4875-9db8-e1bcfad35979 | host2 |
| bc626d3d-5503-4875-9db8-e1bcfad35979 | routed-devstack.lab.example.com |
| ec7717dd-8533-464f-a3c8-4ecc7dc08d10 | host2 |
| ec7717dd-8533-464f-a3c8-4ecc7dc08d10 | routed-devstack.lab.example.com |
+--------------------------------------+---------------------------------+
Conclutions:
* 2x StaleDataError
* 1x successfull _update_segment_host_mapping after_create.
*** We should see 3x attempts to add to placement/host-aggregate, one
for each host agent. ****
Running the reproducer script with the delay uncommented (No issue):
--------------------------------------------------------------------
Run script with the delay, time.sleep(10), enabled.
Results:
* No StaleDataError
* 3 attempts to add the host to placemenb/host-aggregate.
MariaDB [neutron]> SELECT * FROM segmenthostmappings;
+--------------------------------------+---------------------------------+
| segment_id | host |
+--------------------------------------+---------------------------------+
| 11b9258f-8712-43b7-8f39-3eab627a8c7f | host0 |
| 11b9258f-8712-43b7-8f39-3eab627a8c7f | host1 |
| 11b9258f-8712-43b7-8f39-3eab627a8c7f | host2 |
| 11b9258f-8712-43b7-8f39-3eab627a8c7f | routed-devstack.lab.example.com |
| 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | host0 |
| 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | host1 |
| 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | host2 |
| 89f96bee-424c-4ee2-8639-2ca8e07a70e6 | routed-devstack.lab.example.com |
| a7a7d2f4-c809-4ebb-916f-930c97fbec47 | host0 |
| a7a7d2f4-c809-4ebb-916f-930c97fbec47 | host1 |
| a7a7d2f4-c809-4ebb-916f-930c97fbec47 | host2 |
| a7a7d2f4-c809-4ebb-916f-930c97fbec47 | routed-devstack.lab.example.com |
+--------------------------------------+---------------------------------+
Conclution:
* 3x successfull _update_segment_host_mapping after_create.
** NOTE: **
The RESP BODY: {"itemNotFound": {"message": "Compute host host1 could not be found.", "code": 404}} errors in the logs is expected, the fake host is not in Nova, so this is expeced.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1743579/+subscriptions
References