yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #94656
[Bug 2083237] [NEW] Initial router state is not set correctly
Public bug reported:
Context
=======
OpenStack Antelope (but master seems affected).
When a router is created in HA mode, multiple L3 agents (3 by default)
are spawning a keepalived process to monitor the state of the router.
The initial state of the router is supposed to be saved in the
'initial_state' variable when a call to the initial_state_change()
function is done.
This initial_state is kept so that it prevent false bounces when
keepalived is transiting.
Problem
=======
The initial_state is set only when the state of the router is primary.
So in a scenario with 3 L3 agents, we could have:
t0:
agent-1 initial state: primary
agent-2 initial state: (unset)
agent-3 initial state: (unset)
t1:
agent-1 failure
agent-2 transition to primary
agent-3 transition to primary
both agent-2 and 3 are transitionning to primary and neutron will send a port binding update to server.
The last one sending the request will win the binding.
Let's imagine the binding is now on agent-3
t2:
agent-1 failure
agent-2 primary
agent-3 transition to backup
agent-2 wins and stay primary, agent-3 transition to backup.
So now, we have the port binding recorded to be on agent-3 but agent-2 is actually primary.
Solution
========
Neutron code is supposed to handle false bounces by setting the initial state correctly.
Then the code will sleep (eventlet.sleep(self.conf.ha_vrrp_advert_int)) until the keepalived state is stabilized.
So only one agent will grab the binding.
To make sure this code works, the initial state needs to be set
correctly from the beginning.
** Affects: neutron
Importance: Undecided
Assignee: Arnaud Morin (arnaud-morin)
Status: In Progress
** Changed in: neutron
Assignee: (unassigned) => Arnaud Morin (arnaud-morin)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2083237
Title:
Initial router state is not set correctly
Status in neutron:
In Progress
Bug description:
Context
=======
OpenStack Antelope (but master seems affected).
When a router is created in HA mode, multiple L3 agents (3 by default)
are spawning a keepalived process to monitor the state of the router.
The initial state of the router is supposed to be saved in the
'initial_state' variable when a call to the initial_state_change()
function is done.
This initial_state is kept so that it prevent false bounces when
keepalived is transiting.
Problem
=======
The initial_state is set only when the state of the router is primary.
So in a scenario with 3 L3 agents, we could have:
t0:
agent-1 initial state: primary
agent-2 initial state: (unset)
agent-3 initial state: (unset)
t1:
agent-1 failure
agent-2 transition to primary
agent-3 transition to primary
both agent-2 and 3 are transitionning to primary and neutron will send a port binding update to server.
The last one sending the request will win the binding.
Let's imagine the binding is now on agent-3
t2:
agent-1 failure
agent-2 primary
agent-3 transition to backup
agent-2 wins and stay primary, agent-3 transition to backup.
So now, we have the port binding recorded to be on agent-3 but agent-2 is actually primary.
Solution
========
Neutron code is supposed to handle false bounces by setting the initial state correctly.
Then the code will sleep (eventlet.sleep(self.conf.ha_vrrp_advert_int)) until the keepalived state is stabilized.
So only one agent will grab the binding.
To make sure this code works, the initial state needs to be set
correctly from the beginning.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2083237/+subscriptions