← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2083237] [NEW] Initial router state is not set correctly

 

Public bug reported:

Context
=======

OpenStack Antelope (but master seems affected).

When a router is created in HA mode, multiple L3 agents (3 by default)
are spawning a keepalived process to monitor the state of the router.

The initial state of the router is supposed to be saved in the
'initial_state' variable when a call to the initial_state_change()
function is done.

This initial_state is kept so that it prevent false bounces when
keepalived is transiting.

Problem
=======

The initial_state is set only when the state of the router is primary.
So in a scenario with 3 L3 agents, we could have:

t0:
agent-1 initial state: primary
agent-2 initial state: (unset)
agent-3 initial state: (unset)

t1:
agent-1 failure
agent-2 transition to primary
agent-3 transition to primary

both agent-2 and 3 are transitionning to primary and neutron will send a port binding update to server.
The last one sending the request will win the binding.
Let's imagine the binding is now on agent-3

t2:
agent-1 failure
agent-2 primary
agent-3 transition to backup

agent-2 wins and stay primary, agent-3 transition to backup.


So now, we have the port binding recorded to be on agent-3 but agent-2 is actually primary.


Solution
========

Neutron code is supposed to handle false bounces by setting the initial state correctly.
Then the code will sleep (eventlet.sleep(self.conf.ha_vrrp_advert_int)) until the keepalived state is stabilized.
So only one agent will grab the binding.

To make sure this code works, the initial state needs to be set
correctly from the beginning.

** Affects: neutron
     Importance: Undecided
     Assignee: Arnaud Morin (arnaud-morin)
         Status: In Progress

** Changed in: neutron
     Assignee: (unassigned) => Arnaud Morin (arnaud-morin)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2083237

Title:
  Initial router state is not set correctly

Status in neutron:
  In Progress

Bug description:
  Context
  =======

  OpenStack Antelope (but master seems affected).

  When a router is created in HA mode, multiple L3 agents (3 by default)
  are spawning a keepalived process to monitor the state of the router.

  The initial state of the router is supposed to be saved in the
  'initial_state' variable when a call to the initial_state_change()
  function is done.

  This initial_state is kept so that it prevent false bounces when
  keepalived is transiting.

  Problem
  =======

  The initial_state is set only when the state of the router is primary.
  So in a scenario with 3 L3 agents, we could have:

  t0:
  agent-1 initial state: primary
  agent-2 initial state: (unset)
  agent-3 initial state: (unset)

  t1:
  agent-1 failure
  agent-2 transition to primary
  agent-3 transition to primary

  both agent-2 and 3 are transitionning to primary and neutron will send a port binding update to server.
  The last one sending the request will win the binding.
  Let's imagine the binding is now on agent-3

  t2:
  agent-1 failure
  agent-2 primary
  agent-3 transition to backup

  agent-2 wins and stay primary, agent-3 transition to backup.

  
  So now, we have the port binding recorded to be on agent-3 but agent-2 is actually primary.

  
  Solution
  ========

  Neutron code is supposed to handle false bounces by setting the initial state correctly.
  Then the code will sleep (eventlet.sleep(self.conf.ha_vrrp_advert_int)) until the keepalived state is stabilized.
  So only one agent will grab the binding.

  To make sure this code works, the initial state needs to be set
  correctly from the beginning.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2083237/+subscriptions