← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1453855] [NEW] HA routers may fail to send out GARPs when node boots

 

Public bug reported:

When a node boots, it starts the OVS and L3 agents. As an example, in
RDO systemd unit files, these services have no dependency. This means
that the L3 agent can stop before the OVS agent. It can start
configuring routers before the OVS agent finished syncing with the
server and starts processing ovsdb monitor updates. The result is that
when the L3 agent finishes configuring an HA router, it starts up
keepalived, which under certain conditions will transition to master and
send our gratuitous ARPs before the OVS agent finishes plugging its
ports. This means that the gratuitous ARP will be lost, but with the
router acting as master, this can cause black holes.

Possible solutions:
* Introduce systemd dependencies, but this has its set of intricacies and it's hard to solve the above problem comprehensively just with this approach.
* Regardless, it's a good idea to use new keepalived flags:
garp_master_repeat <INTEGER>        # how often the gratuitous ARP after MASTER
                                                                       #  state transition should be repeated?
garp_master_refresh <INTEGER>      # Periodic delay in seconds sending
                                                                       #  gratuitous ARP while in MASTER state

** Affects: neutron
     Importance: Medium
         Status: New


** Tags: l3-ha

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1453855

Title:
  HA routers may fail to send out GARPs when node boots

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  When a node boots, it starts the OVS and L3 agents. As an example, in
  RDO systemd unit files, these services have no dependency. This means
  that the L3 agent can stop before the OVS agent. It can start
  configuring routers before the OVS agent finished syncing with the
  server and starts processing ovsdb monitor updates. The result is that
  when the L3 agent finishes configuring an HA router, it starts up
  keepalived, which under certain conditions will transition to master
  and send our gratuitous ARPs before the OVS agent finishes plugging
  its ports. This means that the gratuitous ARP will be lost, but with
  the router acting as master, this can cause black holes.

  Possible solutions:
  * Introduce systemd dependencies, but this has its set of intricacies and it's hard to solve the above problem comprehensively just with this approach.
  * Regardless, it's a good idea to use new keepalived flags:
  garp_master_repeat <INTEGER>        # how often the gratuitous ARP after MASTER
                                                                         #  state transition should be repeated?
  garp_master_refresh <INTEGER>      # Periodic delay in seconds sending
                                                                         #  gratuitous ARP while in MASTER state

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1453855/+subscriptions


Follow ups

References