← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1412542] [NEW] L3 agent restart does not SIGHUP running keepalived processes

 

Public bug reported:

Per
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/keepalived.py#L405:

When the L3 agent starts, it invokes keepalived_manager spawn method,
which spawns the the underlying keepalived process, unless it's already
running. This issue only manifests for L3 agent restarts, because for an
already-running agent, when it reconfigures keepalived due to an RPC
update call, it does successfully sends a SIGHUP signal to the process.

The effect is that restarting a L3 agent does not SIGHUP any running
keepalived processes. So, for example, if the L3 agent crashes and is
started again a minute or two later (This is dependent on timers
configured for external tools such as Pacemaker), the L3 agent resyncs
with the controller but doesn't SIGHUP any existing keepalived
processes. This means that any updates that happened during the L3 agent
downtime will be picked up during that initial resync, but the agent
won't actually reconfigure keepalived.

It is also an issue during upgrades for reasons similar to what's
explained above, as it's actually an identical flow. Fixing this bug is
a precondition to a couple of other fixes if we want backports to
actually fix their respective issues on Juno.

** Affects: neutron
     Importance: Undecided
     Assignee: Assaf Muller (amuller)
         Status: In Progress


** Tags: juno-backport-potential l3-ha

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1412542

Title:
  L3 agent restart does not SIGHUP running keepalived processes

Status in OpenStack Neutron (virtual network service):
  In Progress

Bug description:
  Per
  https://github.com/openstack/neutron/blob/master/neutron/agent/linux/keepalived.py#L405:

  When the L3 agent starts, it invokes keepalived_manager spawn method,
  which spawns the the underlying keepalived process, unless it's
  already running. This issue only manifests for L3 agent restarts,
  because for an already-running agent, when it reconfigures keepalived
  due to an RPC update call, it does successfully sends a SIGHUP signal
  to the process.

  The effect is that restarting a L3 agent does not SIGHUP any running
  keepalived processes. So, for example, if the L3 agent crashes and is
  started again a minute or two later (This is dependent on timers
  configured for external tools such as Pacemaker), the L3 agent resyncs
  with the controller but doesn't SIGHUP any existing keepalived
  processes. This means that any updates that happened during the L3
  agent downtime will be picked up during that initial resync, but the
  agent won't actually reconfigure keepalived.

  It is also an issue during upgrades for reasons similar to what's
  explained above, as it's actually an identical flow. Fixing this bug
  is a precondition to a couple of other fixes if we want backports to
  actually fix their respective issues on Juno.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1412542/+subscriptions


Follow ups

References