← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1952907] Re: Gratuitous ARPs are not sent during master transition

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/839671
Committed: https://opendev.org/openstack/neutron/commit/5288593fafe6636fc14b8873465866d20de26935
Submitter: "Zuul (22348)"
Branch:    master

commit 5288593fafe6636fc14b8873465866d20de26935
Author: Damian Dabrowski <damian@dabrowski.cloud>
Date:   Thu Apr 28 02:54:25 2022 +0200

    [L3-HA] Disable automatic link-local address assignment for HA routers
    
    In order to get both [1] and [2] fixed, we set
    `net.ipv6.conf.all.addr_gen_mode=1` in HA router namespace to
    prevent auto-assigning link-local address(lla) to the interfaces.
    We don't need lla auto-assignment as keepalived manages them.
    With this change, we will have link-local addresses only on active
    router, which will prevent 'dadfailed' and MLD packets will not be
    sent from standby router.
    
    Previously we also reverted [3] to always keep qg-* interface up on both
    active&standby router's instance, no matter if keepalived is started or
    not.
    Without link-local address assigned, backup router's instance won't
    send any packets, so I see no reason to keep qg-* interface down.
    
    [1] https://bugs.launchpad.net/neutron/+bug/1952907
    [2] https://bugs.launchpad.net/neutron/+bug/1859832
    [3] https://review.opendev.org/c/openstack/neutron/+/834162
    
    Closes-Bug: #1952907
    Related-Bug: #1859832
    Depends-On: https://review.opendev.org/c/openstack/neutron/+/834162
    Change-Id: I306f14aa6b7e8bb69a81f441be337bc1a584d3b2


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1952907

Title:
  Gratuitous ARPs are not sent during master transition

Status in neutron:
  Fix Released

Bug description:
  * High level description:

  When a router transitions to MASTER state, keepalived should send GARPs but it fails because qg-* interface is down(it comes up about 1 sec after that, so it might be some race condition)
  Keepalived should also send another GARPs after 60 seconds(garp_master_delay) but it doesn't(probably because first ones fail, but I'm not 100% sure).

  When I add random port to this router to trigger keepalived's reload,
  then all GARPs are sent properly(because netns is already configured
  and qg-* interface is up for the whole time)

  * Pre-conditions:

  Operating System: Ubuntu 20.04
  Keepalived version: 2.0.19
  Affected neutron releases:
    - my AIO env: Xena (master/106fa3e6d3f0b1c32ef28fe9dd6b125b9317e9cf # HEAD as of 29.09.2021)
    - my prod env: Victoria
    - (most likely all versions after this change https://review.opendev.org/c/openstack/neutron/+/707406)

  * Step-by-step reproduction:

  Simply perform a failover on HA router.
  The same goal may be also achieved by removing all l3 agents from the router, and then adding one, so:

  # openstack router create neutron-bug --ha
  # openstack router set --external-gateway public neutron-bug
  # neutron l3-agent-list-hosting-router neutron-bug
  # (for all l3 agents): neutron l3-agent-router-remove L3_AGENT_ID neutron-bug
  # (for a single l3 agent): neutron l3-agent-router-add L3_AGENT_ID neutron-bug
  (GARPs are not sent)
  # openstack router add port neutron-bug test-port
  (GARPs are sent properly)

  * Expected output:

  Gratuitous ARPs should be sent from router's namespace during MASTER
  transition.

  * Actual output:

  Gratuitous ARPs are not sent.
  Keepalived complains about: Error 100 (Network is down) sending gratuitous ARP on qg-4a2f0239-5c for 172.29.249.194
  qg-* interface wakes up about 1 second after keepalived tries to send GARPs.

  * Root cause

  Currently neutron keeps qg- interface down for BACKUP agents: https://review.opendev.org/c/openstack/neutron/+/707406
  Keepalived's MASTER transition takes place before keepalived-state-change notifies neutron-l3-agent about state change.
  As a result, neutron-l3-agent links qg- interface after keepalived's MASTER transition, which simply means that keepalived can't send GARPs during this transition, because qg- interface is down then.

  
  * Proposed solutions

  1. Revert https://review.opendev.org/c/openstack/neutron/+/707406 and always keep qg- interfaces up
  I'm not sure, but maybe we don't need above change anymore because it was fixed in keepalived: https://github.com/acassen/keepalived/commit/b10bbfc2a2b216487cea5a586c55765275e41253

  2. Send delayed GARPs by keepalived_state_change.py
  Change proposal: https://review.opendev.org/c/openstack/neutron/+/821433

  3. Send GARPs also for FIPs(like it's done for non-HA routers by ./agent/l3/legacy_router.py)
  Change proposal: https://review.opendev.org/c/openstack/neutron/+/821434

  
  P.S. As solutions 2. and 3. only sends GARPs, we may also need to fix IPv6's NDP. Besides ARPs, keepalived also fails to send unsolicited neighbor advertisements. I'm not sure about it though, I don't know much about IPv6.

  
  * Attachments:

  Keepalived logs: https://paste.openstack.org/raw/811372/
  Interfaces inside router's netns + tcpdump from master transition: https://paste.openstack.org/raw/811373/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1952907/+subscriptions



References