yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #50395
[Bug 1533454] Re: L3 agent unable to update HA router state after race between HA router creating and deleting
** Also affects: neutron/kilo
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1533454
Title:
L3 agent unable to update HA router state after race between HA router
creating and deleting
Status in neutron:
Fix Released
Status in neutron kilo series:
New
Bug description:
The router L3 HA binding process does not take into account the fact
that the port it is binding to the agent can be concurrently deleted.
Details:
When neutron server deleted all the resources of a
HA router, L3 agent can not aware that, so race
happened in some procedure like this:
1. Neutron server delete all resources of a HA router
2. RPC fanout to L3 agent 1 in which
the HA router was master state
3. In l3 agent 2 'backup' router set itself to masert
and notify neutron server a HA router state change notify.
4. PortNotFound rasied in updating HA router states function
(Seems the DB error was no longer existed.)
How the step 2 and 3 happens?
Consider that l3 agent 2 has much more HA routers than l3 agent 1,
or any reason that causes l3 agent 2 gets/processes the deleting
RPC later than l3 agent 1. Then l3 agent 1 remove HA router's
keepalived process will soonly be detected by backup router in
l3 agent 2 via VRRP protocol. Now the router deleting RPC is in
the queue of RouterUpdate or any step of a HA router deleting
procedure, and the router_info will still have 'the' router info.
So l3 agent 2 will do the state change procedure, AKA notify
the neutron server to update router state.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1533454/+subscriptions
References