← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1597461] Re: L3 HA: 2 masters after reboot of controller

 

Reviewed:  https://review.openstack.org/470905
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d730b1010277138136512eb6efb12ab893ca6793
Submitter: Jenkins
Branch:    master

commit d730b1010277138136512eb6efb12ab893ca6793
Author: venkata anil <anilvenkata@xxxxxxxxxx>
Date:   Mon Jun 5 09:56:18 2017 +0000

    Set HA network port to DOWN when l3 agent starts
    
    When l3 agent node is rebooted, if HA network port status is already
    ACTIVE in DB, agent will get this status from server and then spawn
    the keepalived (though l2 agent might not have wired the port),
    resulting in multiple HA masters active at the same time.
    
    To fix this, when the L3 agent starts up we can have it explicitly
    set the port status to DOWN for all of the HA ports on that node.
    Then we are guaranteed that when they go to ACTIVE it will be because
    the L2 agent has wired the ports.
    
    Closes-bug: #1597461
    Change-Id: Ib0c8a71b6ff97e43a414f3db4882914b12170d53


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1597461

Title:
  L3 HA: 2 masters after reboot of controller

Status in neutron:
  Fix Released

Bug description:
  ENV: Mitaka 3 controllers 45 computes DVR + L3 HA (L3 HA as well
  affected)

  After reboot of controller on which l3 agent is active, another l3
  agent becomes active. When rebooted node recover, that l3 agent
  becomes active as well - this lead to extra loss of external
  connectivity in tenant network. After some time the only one agent
  remains to be active - the one from rebooted node. Sometimes
  connectivity does not come back, as snat port ends up on wrong host.

  The root cause of this problem is that routers are processed by l3
  agent before openvswitch agent sets up appropriate ha ports, so for
  some time recovered ha routers is isolated from ha routers on other
  hosts and becomes active.

  The possible solution for this is proper serialization of ha network
  creation by l3 agent after ha network is set up on controller.

  With 100 routers and networks this issues has been reproduced with
  every reboot.

  Actually this is L3 HA problem, it is just increased with DVR as the
  number of ports that openvswith agent should handle is higher.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1597461/+subscriptions


References