yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #49421
[Bug 1550886] Re: L3 Agent's fullsync is raceful with creation of HA router
Reviewed: https://review.openstack.org/257059
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9c3c19f07ce52e139d431aec54341c38a183f0b7
Submitter: Jenkins
Branch: master
commit 9c3c19f07ce52e139d431aec54341c38a183f0b7
Author: Kevin Benton <kevin@xxxxxxxxxx>
Date: Thu Feb 18 03:48:29 2016 -0800
Add ALLOCATING state to routers
This patch adds a new ALLOCATING status to routers
to indicate that the routers are still being built on the
Neutron server. Any routers in this state are excluded in
router retrievals by the L3 agent since they are not yet
ready to be wired up.
This is necessary when a router is made up of several
distinct Neutron resources that cannot all be put
into a single transaction. This patch applies this new
state to HA routers while their internal HA ports and
networks are being created/deleted so the L3 HA agent
will never retrieve a partially formed HA router. It's
important to note that the ALLOCATING status carries over
until after the scheduling is done, which ensures that
routers that weren't fully scheduled will not be sent to
the agents.
An HA router is placed in this state only when it is being
created or converted to/from the HA state since this is
disruptive to the dataplane.
This patch also reverts the changes introduced in
Iadb5a69d4cbc2515fb112867c525676cadea002b since they will
be handled by the ALLOCATING logic instead.
Co-Authored-By: Ann Kamyshnikova <akamyshnikova@xxxxxxxxxxxx>
Co-Authored-By: John Schwarz <jschwarz@xxxxxxxxxx>
APIImpact
Closes-Bug: #1550886
Related-bug: #1499647
Change-Id: I22ff5a5a74527366da8f82982232d4e70e455570
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1550886
Title:
L3 Agent's fullsync is raceful with creation of HA router
Status in neutron:
Fix Released
Bug description:
When creating an HA router, after the server creates all the DB
objects (including the HA network and ports if it's the first one),
the server continues on the schedule the router to (some of) the
available agents.
The race is achieved when an L3 agent router issues a sync_router
request, which later down the line ends up in an
auto_schedule_routers() call. If this happens before the above
schedule (of the create_router()) is complete, the server will refuse
to schedule the router to the other intended L3 agents, resulting is
less agents being scheduled.
The only way to fix this is either restarting one of the L3 agents
which didn't get scheduled, or recreating the router. Either is a bad
option.
An example of the state:
$ neutron l3-agent-list-hosting-router router2
+--------------------------------------+-------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+-------------------------+----------------+-------+----------+
| d05da32b-34e7-4c7f-b0dd-938328a0c0ed | vpn-6-12 | True | :-) | active |
+--------------------------------------+-------------------------+----------------+-------+----------+
(only 1 of the agent got scheduled with the router, even though there are 3 suitable agents that normally get scheduled without the race.)
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1550886/+subscriptions
References