← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1455439] [NEW] l3 agent: race may lead to creation of deleted routers

 

Public bug reported:

During startup (or in case of any sync failure) l3 agent initiates full resync with neutron server.
That means fetching all router info and adding update events for each router to agent processing queue.
Important thing is that such events are added with SYNC priority which is lower than RPC priority of events resulting from users adding/updating/deleting routers. Another important thing is that agent won't ask server for router info later when processing SYNC events as all info was received initially on sync start.

The race is when router is deleted during l3 agent resync: while router
update event with SYNC priority may be still waiting in the queue,
router deleted event with RPC priority added to the queue and processed.
SYNC event will be processed later thus recreating router which was
already deleted. Such routers will be deleted on agent node only in case
of agent restart or another resync.

One way to fix is to not fetch full routers info on resync start but just ids and get full router info when processing update for particular router. The dowside is that it increases rpc communications between agent and server and thus slows down both of them.
Another way would be to delete all events (for all priorities) related to particular router when receiving router_deleted notification but seems PriorityQueue used by agent does not allow search and pop by parameters (that may also slow down processing).

So I'm going to propose adding two events (for both priorities) to the
queue on router deleted notification, so "deleted" event will be latest
by timestamp for both priorities and router won't be recreated. In case
no resync is happening during router deletion (normal case) additional
router deleted event should not bring much burden to the agent as it's a
pretty cheap call for unknown(deleted) router.

** Affects: neutron
     Importance: Undecided
     Assignee: Oleg Bondarev (obondarev)
         Status: New


** Tags: l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1455439

Title:
  l3 agent: race may lead to creation of deleted routers

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  During startup (or in case of any sync failure) l3 agent initiates full resync with neutron server.
  That means fetching all router info and adding update events for each router to agent processing queue.
  Important thing is that such events are added with SYNC priority which is lower than RPC priority of events resulting from users adding/updating/deleting routers. Another important thing is that agent won't ask server for router info later when processing SYNC events as all info was received initially on sync start.

  The race is when router is deleted during l3 agent resync: while
  router update event with SYNC priority may be still waiting in the
  queue, router deleted event with RPC priority added to the queue and
  processed. SYNC event will be processed later thus recreating router
  which was already deleted. Such routers will be deleted on agent node
  only in case of agent restart or another resync.

  One way to fix is to not fetch full routers info on resync start but just ids and get full router info when processing update for particular router. The dowside is that it increases rpc communications between agent and server and thus slows down both of them.
  Another way would be to delete all events (for all priorities) related to particular router when receiving router_deleted notification but seems PriorityQueue used by agent does not allow search and pop by parameters (that may also slow down processing).

  So I'm going to propose adding two events (for both priorities) to the
  queue on router deleted notification, so "deleted" event will be
  latest by timestamp for both priorities and router won't be recreated.
  In case no resync is happening during router deletion (normal case)
  additional router deleted event should not bring much burden to the
  agent as it's a pretty cheap call for unknown(deleted) router.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1455439/+subscriptions


Follow ups

References