yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #32924
[Bug 1455439] [NEW] l3 agent: race may lead to creation of deleted routers
Public bug reported:
During startup (or in case of any sync failure) l3 agent initiates full resync with neutron server.
That means fetching all router info and adding update events for each router to agent processing queue.
Important thing is that such events are added with SYNC priority which is lower than RPC priority of events resulting from users adding/updating/deleting routers. Another important thing is that agent won't ask server for router info later when processing SYNC events as all info was received initially on sync start.
The race is when router is deleted during l3 agent resync: while router
update event with SYNC priority may be still waiting in the queue,
router deleted event with RPC priority added to the queue and processed.
SYNC event will be processed later thus recreating router which was
already deleted. Such routers will be deleted on agent node only in case
of agent restart or another resync.
One way to fix is to not fetch full routers info on resync start but just ids and get full router info when processing update for particular router. The dowside is that it increases rpc communications between agent and server and thus slows down both of them.
Another way would be to delete all events (for all priorities) related to particular router when receiving router_deleted notification but seems PriorityQueue used by agent does not allow search and pop by parameters (that may also slow down processing).
So I'm going to propose adding two events (for both priorities) to the
queue on router deleted notification, so "deleted" event will be latest
by timestamp for both priorities and router won't be recreated. In case
no resync is happening during router deletion (normal case) additional
router deleted event should not bring much burden to the agent as it's a
pretty cheap call for unknown(deleted) router.
** Affects: neutron
Importance: Undecided
Assignee: Oleg Bondarev (obondarev)
Status: New
** Tags: l3-ipam-dhcp
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1455439
Title:
l3 agent: race may lead to creation of deleted routers
Status in OpenStack Neutron (virtual network service):
New
Bug description:
During startup (or in case of any sync failure) l3 agent initiates full resync with neutron server.
That means fetching all router info and adding update events for each router to agent processing queue.
Important thing is that such events are added with SYNC priority which is lower than RPC priority of events resulting from users adding/updating/deleting routers. Another important thing is that agent won't ask server for router info later when processing SYNC events as all info was received initially on sync start.
The race is when router is deleted during l3 agent resync: while
router update event with SYNC priority may be still waiting in the
queue, router deleted event with RPC priority added to the queue and
processed. SYNC event will be processed later thus recreating router
which was already deleted. Such routers will be deleted on agent node
only in case of agent restart or another resync.
One way to fix is to not fetch full routers info on resync start but just ids and get full router info when processing update for particular router. The dowside is that it increases rpc communications between agent and server and thus slows down both of them.
Another way would be to delete all events (for all priorities) related to particular router when receiving router_deleted notification but seems PriorityQueue used by agent does not allow search and pop by parameters (that may also slow down processing).
So I'm going to propose adding two events (for both priorities) to the
queue on router deleted notification, so "deleted" event will be
latest by timestamp for both priorities and router won't be recreated.
In case no resync is happening during router deletion (normal case)
additional router deleted event should not bring much burden to the
agent as it's a pretty cheap call for unknown(deleted) router.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1455439/+subscriptions
Follow ups
References