yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87056
[Bug 1939432] Re: Concurrent DHCP agent updates can result in a DB lock
Reviewed: https://review.opendev.org/c/openstack/neutron/+/804218
Committed: https://opendev.org/openstack/neutron/commit/668b1cc652f076e555ef1fc1289684367159186a
Submitter: "Zuul (22348)"
Branch: master
commit 668b1cc652f076e555ef1fc1289684367159186a
Author: Rodolfo Alonso Hernandez <ralonsoh@xxxxxxxxxx>
Date: Wed Aug 11 09:13:55 2021 +0000
Do not fail if the agent load is not bumped
When a new network and its first subnet is created, the DHCP agent
bumps the "load" parameter to reflect the number of networks handled.
This "load" parameter is modified when:
- As commented, when the first subnet of a network is created. The
"load" value is bumped.
- When periodically the DHCP agent sends the status, informing about
the current number of networks handled.
If during the subnet creation this "load" value is not updated, it will
be in the next periodic update of the agent.
This "load" value is used by the scheduler to equally distribute the
objects to be managed by any agent type (DHCP agents manage networks).
The bug refers to DHCP but is valid for any other agent.
Change-Id: Ief402048d99d40b64d81fcf58eb2e39b1ba7ebbb
Closes-Bug: #1939432
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1939432
Title:
Concurrent DHCP agent updates can result in a DB lock
Status in neutron:
Fix Released
Bug description:
Bugzilla reference:
https://bugzilla.redhat.com/show_bug.cgi?id=1982981
When a new network and the first subnet are created, the DHCP agent is
updated. The agent scheduler increases the DHCP agent register "load"
[1] field that will be used to schedule new networks into the same
agent.
If multiple concurrent networks (and the first subnet) are created,
the agent "load" will be modified concurrently. The DB guarantees that
only one transaction can increase the agent "load" parameter at once;
the other transactions will fail and retried again. E.g.:
https://paste.opendev.org/show/807984/
NOTE: when I say network and the first subnet is because that will
trigger the spawn of a new dnsmasq process. This is the event that
increases +1 the "load" value. Any other new subnet added to this
network will modify the dnsmasq config but won't increase the "load"
value.
As commented in the "BaseResourceFilter.bind" method [2], "the
resource being bound might or might not be of the same type which is
accounted for the load. It isn't a problem because "+ 1" here does not
meant to predict precisely what the load of the agent will be. The
value will be corrected by the agent on the next report interval." In
other words, when the DHCP agent reports the status, accurately
updates the number of resources (networks) that is handling.
This bug proposes to catch the DB errors in "BaseResourceFilter.bind"
method [2] to avoid the DB retry action. That is unnecessary because
the DHCP agent, as commented, will update the "load" value. By
avoiding this retry, we avoid unnecessary Neutron server and DB
operations and command delays (for example when creating a subnet).
[1]https://github.com/openstack/neutron/blob/0ccfed0ae13182f820e6a8c11a2fa801506f3a3a/neutron/db/models/agent.py#L55
[2]https://github.com/openstack/neutron/blob/0ccfed0ae13182f820e6a8c11a2fa801506f3a3a/neutron/scheduler/base_resource_filter.py#L35-L39
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1939432/+subscriptions
References