← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1550886] Re: L3 Agent's fullsync is raceful with creation of HA router

 

Reviewed:  https://review.openstack.org/257059
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9c3c19f07ce52e139d431aec54341c38a183f0b7
Submitter: Jenkins
Branch:    master

commit 9c3c19f07ce52e139d431aec54341c38a183f0b7
Author: Kevin Benton <kevin@xxxxxxxxxx>
Date:   Thu Feb 18 03:48:29 2016 -0800

    Add ALLOCATING state to routers
    
    This patch adds a new ALLOCATING status to routers
    to indicate that the routers are still being built on the
    Neutron server. Any routers in this state are excluded in
    router retrievals by the L3 agent since they are not yet
    ready to be wired up.
    
    This is necessary when a router is made up of several
    distinct Neutron resources that cannot all be put
    into a single transaction. This patch applies this new
    state to HA routers while their internal HA ports and
    networks are being created/deleted so the L3 HA agent
    will never retrieve a partially formed HA router. It's
    important to note that the ALLOCATING status carries over
    until after the scheduling is done, which ensures that
    routers that weren't fully scheduled will not be sent to
    the agents.
    
    An HA router is placed in this state only when it is being
    created or converted to/from the HA state since this is
    disruptive to the dataplane.
    
    This patch also reverts the changes introduced in
    Iadb5a69d4cbc2515fb112867c525676cadea002b since they will
    be handled by the ALLOCATING logic instead.
    
    Co-Authored-By: Ann Kamyshnikova <akamyshnikova@xxxxxxxxxxxx>
    Co-Authored-By: John Schwarz <jschwarz@xxxxxxxxxx>
    
    APIImpact
    Closes-Bug: #1550886
    Related-bug: #1499647
    Change-Id: I22ff5a5a74527366da8f82982232d4e70e455570


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1550886

Title:
  L3 Agent's fullsync is raceful with creation of HA router

Status in neutron:
  Fix Released

Bug description:
  When creating an HA router, after the server creates all the DB
  objects (including the HA network and ports if it's the first one),
  the server continues on the schedule the router to (some of) the
  available agents.

  The race is achieved when an L3 agent router issues a sync_router
  request, which later down the line ends up in an
  auto_schedule_routers() call. If this happens before the above
  schedule (of the create_router()) is complete, the server will refuse
  to schedule the router to the other intended L3 agents, resulting is
  less agents being scheduled.

  The only way to fix this is either restarting one of the L3 agents
  which didn't get scheduled, or recreating the router. Either is a bad
  option.

  An example of the state:
  $ neutron l3-agent-list-hosting-router router2
  +--------------------------------------+-------------------------+----------------+-------+----------+
  | id                                   | host                    | admin_state_up | alive | ha_state |
  +--------------------------------------+-------------------------+----------------+-------+----------+
  | d05da32b-34e7-4c7f-b0dd-938328a0c0ed | vpn-6-12                | True           | :-)   | active   |
  +--------------------------------------+-------------------------+----------------+-------+----------+
  (only 1 of the agent got scheduled with the router, even though there are 3 suitable agents that normally get scheduled without the race.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1550886/+subscriptions


References