yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #56223
[Bug 1602320] Re: ha + distributed router: keepalived process kill vrrp child process
Reviewed: https://review.openstack.org/366493
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2b148c3f9299642e0bb068983de68ec6441a23be
Submitter: Jenkins
Branch: master
commit 2b148c3f9299642e0bb068983de68ec6441a23be
Author: He Qing <heqing@xxxxxxxxxxxxx>
Date: Wed Sep 7 05:07:25 2016 +0000
Fix wrong HA router state
When we add/remove router interface from HA router, l3 agent
will send SIGHUP signal to keepalived for reloading configuraion.
But for DVR+HA router, l3 agent will send SIGHUP signal TWICE which
will cause VRRP sub-process terminated and vip addresses and routes
left over. Keepalived then restart VRRP process and there will be
a re-election between VRRP peers. After the election, if the former
is still master, the state showed from Neutron will be correct. But
if the former master transitioned to backup, the new VRRP process
will NOT delete vips and routes because it is not the one who
configured them. There will be two active agent showed from Neutron.
HaRouter.enable_keepalived() will send SIGHUP signal to keepalived.
DvrEdgeHaRouter.process() should not call enable_keepalived() by
itself because it has inherited from class HaRouter.
Closes-Bug: 1602320
Change-Id: I647269665a22b4becb3e326e1f4b03ddd961d6b1
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1602320
Title:
ha + distributed router: keepalived process kill vrrp child process
Status in neutron:
Fix Released
Bug description:
Code Repo: mitaka
keepalived version: 1.2.13
node mode: 4 nodes(containers), dvr_snat(l3 agent_mode)
OS: Centos 7
I both configure router_distributed and l3_ha True. Then I create a
router, using neutron l3-agent-list-hosting-router command, the result
show 1 active, 3 standby.
Then I add a router interface, there are more than 1 active.
I trace the /var/log/messages, in the original active l3 agent node:
2016-07-12T16:33:32.083140+08:00 localhost Keepalived[1320437]: VRRP child process(1320438) died: Respawning
2016-07-12T16:33:32.083613+08:00 localhost Keepalived[1320437]: Starting VRRP child process, pid=1340135
Strace info:
http://paste.openstack.org/show/530791/
This is not always failed, sometimes there was only 1 active. Maybe
this is related to the environment, because I can't reproduce in VMs.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1602320/+subscriptions
References