← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1533454] Re: L3 agent unable to update HA router state after race between HA router creating and deleting

 

** Also affects: neutron/kilo
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1533454

Title:
  L3 agent unable to update HA router state after race between HA router
  creating and deleting

Status in neutron:
  Fix Released
Status in neutron kilo series:
  New

Bug description:
  The router L3 HA binding process does not take into account the fact
  that the port it is binding to the agent can be concurrently deleted.

  Details:

  When neutron server deleted all the resources of a
  HA router, L3 agent can not aware that, so race
  happened in some procedure like this:
  1. Neutron server delete all resources of a HA router
  2. RPC fanout to L3 agent 1 in which
     the HA router was master state
  3. In l3 agent 2 'backup' router set itself to masert
     and notify neutron server a HA router state change notify.
  4. PortNotFound rasied in updating HA router states function
  (Seems the DB error was no longer existed.)

  How the step 2 and 3 happens?
  Consider that l3 agent 2 has much more HA routers than l3 agent 1,
  or any reason that causes l3 agent 2 gets/processes the deleting
  RPC later than l3 agent 1. Then l3 agent 1 remove HA router's
  keepalived process will soonly be detected by backup router in
  l3 agent 2 via VRRP protocol. Now the router deleting RPC is in
  the queue of RouterUpdate or any step of a HA router deleting
  procedure, and the router_info will still have 'the' router info.
  So l3 agent 2 will do the state change procedure, AKA notify
  the neutron server to update router state.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1533454/+subscriptions


References