← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1597461] [NEW] L3 HA + DVR: 2 masters after reboot of controller

 

Public bug reported:

ENV: Mitaka 3 controllers 45 computes DVR + L3 HA

After reboot of controller on which l3 agent is active, another l3 agent
becomes active. When rebooted node recover, that l3 agent becomes active
as well - this lead to extra loss of external connectivity in tenant
network. After some time the only one agent remains to be active - the
one from rebooted node. Sometimes connectivity does not come back, as
snat port ends up on wrong host.

The root cause of this problem is that routers are processed by l3 agent
before openvswitch agent sets up appropriate ha ports, so for some time
recovered ha routers is isolated from ha routers on other hosts and
becomes active.

The possible solution for this is proper serialization of ha network
creation by l3 agent after ha network is set up on controller.

With 100 routers and networks this issues has been reproduced with every
reboot.

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: l3-dvr-backlog l3-ha

** Attachment added: "openvswitch agent logs"
   https://bugs.launchpad.net/bugs/1597461/+attachment/4692415/+files/neutron-openvswitch-agent.log.3.gz

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1597461

Title:
  L3 HA + DVR: 2 masters after reboot of controller

Status in neutron:
  New

Bug description:
  ENV: Mitaka 3 controllers 45 computes DVR + L3 HA

  After reboot of controller on which l3 agent is active, another l3
  agent becomes active. When rebooted node recover, that l3 agent
  becomes active as well - this lead to extra loss of external
  connectivity in tenant network. After some time the only one agent
  remains to be active - the one from rebooted node. Sometimes
  connectivity does not come back, as snat port ends up on wrong host.

  The root cause of this problem is that routers are processed by l3
  agent before openvswitch agent sets up appropriate ha ports, so for
  some time recovered ha routers is isolated from ha routers on other
  hosts and becomes active.

  The possible solution for this is proper serialization of ha network
  creation by l3 agent after ha network is set up on controller.

  With 100 routers and networks this issues has been reproduced with
  every reboot.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1597461/+subscriptions


Follow ups