yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #47302
[Bug 1552680] [NEW] [RFE] Add support for DLM
Public bug reported:
Neutron has many code paths that can collide and be raceful which each
other. Current ongoing work can mitigate and minimize these races but
work is slow and it's very hard to fight against what you don't know
(ie. there can always be more races you're not aware of). A DLM
(Distributed Lock Mechanism) such as tooz [1] can help mitigate this
greatly.
An excellent example of this racefulness in Neutron is the L3's
auto_schedule_routers functionality. When creating a tenant's first HA
router more resources must also be created (such as a HA network and HA
ports). This specific flow of creating the resources can be invoke
simultaneously by 2 codepaths: the original create_router (invoked from
the REST API) and from the L3 agent's get_router_ids/sync_routers. These
simultaneous runs can produce many races, such as creating 2 HA networks
(where only one should exist), accidentally deleting valid port bindings
and more. Instead of hunting down these races (which can be a long and
inaccurate task since more races can always exist), this can be solved
much easily by locking the operations done on a single router_id.
Using tooz [1] allows for a distributed lock, which crosses all the
API/RPC workers on a single server and even crosses multiple neutron-
servers. Also, this will help mitigate all sort of races with different
resources (a lock can be associated with a uuid so it won't matter if
the uuid is a router_id, network_id....)
[1]: https://github.com/openstack/tooz/tree/master/
** Affects: neutron
Importance: Undecided
Status: New
** Tags: rfe
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1552680
Title:
[RFE] Add support for DLM
Status in neutron:
New
Bug description:
Neutron has many code paths that can collide and be raceful which each
other. Current ongoing work can mitigate and minimize these races but
work is slow and it's very hard to fight against what you don't know
(ie. there can always be more races you're not aware of). A DLM
(Distributed Lock Mechanism) such as tooz [1] can help mitigate this
greatly.
An excellent example of this racefulness in Neutron is the L3's
auto_schedule_routers functionality. When creating a tenant's first HA
router more resources must also be created (such as a HA network and
HA ports). This specific flow of creating the resources can be invoke
simultaneously by 2 codepaths: the original create_router (invoked
from the REST API) and from the L3 agent's
get_router_ids/sync_routers. These simultaneous runs can produce many
races, such as creating 2 HA networks (where only one should exist),
accidentally deleting valid port bindings and more. Instead of hunting
down these races (which can be a long and inaccurate task since more
races can always exist), this can be solved much easily by locking the
operations done on a single router_id.
Using tooz [1] allows for a distributed lock, which crosses all the
API/RPC workers on a single server and even crosses multiple neutron-
servers. Also, this will help mitigate all sort of races with
different resources (a lock can be associated with a uuid so it won't
matter if the uuid is a router_id, network_id....)
[1]: https://github.com/openstack/tooz/tree/master/
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1552680/+subscriptions
Follow ups