yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #50828
[Bug 1525901] Re: Agents report as started before neutron recognizes as active
** Changed in: neutron/kilo
Status: New => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1525901
Title:
Agents report as started before neutron recognizes as active
Status in neutron:
Fix Released
Status in neutron kilo series:
Fix Released
Bug description:
In HA, there is a potential race condition between the openvswitch
agent and other agents that "own", depend on or manipulate ports. As
the neutron server resumes on a failover it will not immediately be
aware of openvswitch agents that have also been activated on failover
and act as though there are no active openvswitch agents (this is an
example, it most likely affects other L2 agents). If an agent such as
the L3 agent starts and begins resync before the neutron server is
aware of the active openvswitch agent, ports for the routers on that
agent will be marked as "binding_failed". Currently this is a
"terminal" state for the port as neutron does not attempt to rebind
failed bindings on the same host.
Unfortunately, the neutron agents do not provide even a best-effort
deterministic indication to the outside service manager (systemd,
pacemaker, etc...) that it has fully initialized and the neutron
server should be aware that it is active. Agents should follow the
same pattern as wsgi based services and notify systemd after it can be
reasonably assumed that the neutron server should be aware that it is
alive. That way service startup order logic or constraints can
properly start an agent that is dependent on other agents *after*
neutron should be aware that the required agents are active.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1525901/+subscriptions
References