yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #64786
[Bug 1597461] Re: L3 HA: 2 masters after reboot of controller
Reviewed: https://review.openstack.org/470905
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d730b1010277138136512eb6efb12ab893ca6793
Submitter: Jenkins
Branch: master
commit d730b1010277138136512eb6efb12ab893ca6793
Author: venkata anil <anilvenkata@xxxxxxxxxx>
Date: Mon Jun 5 09:56:18 2017 +0000
Set HA network port to DOWN when l3 agent starts
When l3 agent node is rebooted, if HA network port status is already
ACTIVE in DB, agent will get this status from server and then spawn
the keepalived (though l2 agent might not have wired the port),
resulting in multiple HA masters active at the same time.
To fix this, when the L3 agent starts up we can have it explicitly
set the port status to DOWN for all of the HA ports on that node.
Then we are guaranteed that when they go to ACTIVE it will be because
the L2 agent has wired the ports.
Closes-bug: #1597461
Change-Id: Ib0c8a71b6ff97e43a414f3db4882914b12170d53
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1597461
Title:
L3 HA: 2 masters after reboot of controller
Status in neutron:
Fix Released
Bug description:
ENV: Mitaka 3 controllers 45 computes DVR + L3 HA (L3 HA as well
affected)
After reboot of controller on which l3 agent is active, another l3
agent becomes active. When rebooted node recover, that l3 agent
becomes active as well - this lead to extra loss of external
connectivity in tenant network. After some time the only one agent
remains to be active - the one from rebooted node. Sometimes
connectivity does not come back, as snat port ends up on wrong host.
The root cause of this problem is that routers are processed by l3
agent before openvswitch agent sets up appropriate ha ports, so for
some time recovered ha routers is isolated from ha routers on other
hosts and becomes active.
The possible solution for this is proper serialization of ha network
creation by l3 agent after ha network is set up on controller.
With 100 routers and networks this issues has been reproduced with
every reboot.
Actually this is L3 HA problem, it is just increased with DVR as the
number of ports that openvswith agent should handle is higher.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1597461/+subscriptions
References