← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1552680] [NEW] [RFE] Add support for DLM

 

Public bug reported:

Neutron has many code paths that can collide and be raceful which each
other. Current ongoing work can mitigate and minimize these races but
work is slow and it's very hard to fight against what you don't know
(ie. there can always be more races you're not aware of). A DLM
(Distributed Lock Mechanism) such as tooz [1] can help mitigate this
greatly.

An excellent example of this racefulness in Neutron is the L3's
auto_schedule_routers functionality. When creating a tenant's first HA
router more resources must also be created (such as a HA network and HA
ports). This specific flow of creating the resources can be invoke
simultaneously by 2 codepaths: the original create_router (invoked from
the REST API) and from the L3 agent's get_router_ids/sync_routers. These
simultaneous runs can produce many races, such as creating 2 HA networks
(where only one should exist), accidentally deleting valid port bindings
and more. Instead of hunting down these races (which can be a long and
inaccurate task since more races can always exist), this can be solved
much easily by locking the operations done on a single router_id.

Using tooz [1] allows for a distributed lock, which crosses all the
API/RPC workers on a single server and even crosses multiple neutron-
servers. Also, this will help mitigate all sort of races with different
resources (a lock can be associated with a uuid so it won't matter if
the uuid is a router_id, network_id....)


[1]: https://github.com/openstack/tooz/tree/master/

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1552680

Title:
  [RFE] Add support for DLM

Status in neutron:
  New

Bug description:
  Neutron has many code paths that can collide and be raceful which each
  other. Current ongoing work can mitigate and minimize these races but
  work is slow and it's very hard to fight against what you don't know
  (ie. there can always be more races you're not aware of). A DLM
  (Distributed Lock Mechanism) such as tooz [1] can help mitigate this
  greatly.

  An excellent example of this racefulness in Neutron is the L3's
  auto_schedule_routers functionality. When creating a tenant's first HA
  router more resources must also be created (such as a HA network and
  HA ports). This specific flow of creating the resources can be invoke
  simultaneously by 2 codepaths: the original create_router (invoked
  from the REST API) and from the L3 agent's
  get_router_ids/sync_routers. These simultaneous runs can produce many
  races, such as creating 2 HA networks (where only one should exist),
  accidentally deleting valid port bindings and more. Instead of hunting
  down these races (which can be a long and inaccurate task since more
  races can always exist), this can be solved much easily by locking the
  operations done on a single router_id.

  Using tooz [1] allows for a distributed lock, which crosses all the
  API/RPC workers on a single server and even crosses multiple neutron-
  servers. Also, this will help mitigate all sort of races with
  different resources (a lock can be associated with a uuid so it won't
  matter if the uuid is a router_id, network_id....)

  
  [1]: https://github.com/openstack/tooz/tree/master/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1552680/+subscriptions


Follow ups