← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1602320] Re: ha + distributed router: keepalived process kill vrrp child process

 

Reviewed:  https://review.openstack.org/366493
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2b148c3f9299642e0bb068983de68ec6441a23be
Submitter: Jenkins
Branch:    master

commit 2b148c3f9299642e0bb068983de68ec6441a23be
Author: He Qing <heqing@xxxxxxxxxxxxx>
Date:   Wed Sep 7 05:07:25 2016 +0000

    Fix wrong HA router state
    
    When we add/remove router interface from HA router, l3 agent
    will send SIGHUP signal to keepalived for reloading configuraion.
    
    But for DVR+HA router, l3 agent will send SIGHUP signal TWICE which
    will cause VRRP sub-process terminated and vip addresses and routes
    left over. Keepalived then restart VRRP process and there will be
    a re-election between VRRP peers. After the election, if the former
    is still master, the state showed from Neutron will be correct. But
    if the former master transitioned to backup, the new VRRP process
    will NOT delete vips and routes because it is not the one who
    configured them. There will be two active agent showed from Neutron.
    
    HaRouter.enable_keepalived() will send SIGHUP signal to keepalived.
    DvrEdgeHaRouter.process() should not call enable_keepalived() by
    itself because it has inherited from class HaRouter.
    
    Closes-Bug: 1602320
    Change-Id: I647269665a22b4becb3e326e1f4b03ddd961d6b1


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1602320

Title:
  ha + distributed router:  keepalived process kill vrrp child process

Status in neutron:
  Fix Released

Bug description:
  Code Repo: mitaka
  keepalived version: 1.2.13
  node mode: 4 nodes(containers), dvr_snat(l3 agent_mode)
  OS: Centos 7

  I both configure router_distributed and l3_ha True. Then I create a
  router, using neutron l3-agent-list-hosting-router command, the result
  show 1 active, 3 standby.

  Then I add a router interface, there are more than 1 active.
  I trace the /var/log/messages, in the original active l3 agent node:
  2016-07-12T16:33:32.083140+08:00 localhost Keepalived[1320437]: VRRP child process(1320438) died: Respawning
  2016-07-12T16:33:32.083613+08:00 localhost Keepalived[1320437]: Starting VRRP child process, pid=1340135

  Strace info:
  http://paste.openstack.org/show/530791/

  This is not always failed, sometimes there was only 1 active. Maybe
  this is related to the environment, because I can't reproduce in VMs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1602320/+subscriptions


References