← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1952907] [NEW] Gratuitous ARPs are not sent during master transition

 

Public bug reported:

* High level description:

When a router transitions to MASTER state, keepalived should send GARPs but it fails because qg-* interface is down(it comes up about 1 sec after that, so it might be some race condition)
Keepalived should also send another GARPs after 60 seconds(garp_master_delay) but it doesn't(probably because first ones fail, but I'm not 100% sure).

When I add random port to this router to trigger keepalived's reload,
then all GARPs are sent properly(because netns is already configured and
qg-* interface is up for the whole time)


* Pre-conditions:

Operating System: Ubuntu 20.04
Keepalived version: 2.0.19
Affected neutron releases:
  - my AIO env: Xena (master/106fa3e6d3f0b1c32ef28fe9dd6b125b9317e9cf # HEAD as of 29.09.2021)
  - my prod env: Victoria
  - (most likely all versions after this change https://review.opendev.org/c/openstack/neutron/+/707406)


* Step-by-step reproduction:

Simply perform a failover on HA router.
The same goal may be also achieved by removing all l3 agents from the router, and then adding one, so:

# openstack router create neutron-bug
# openstack router set --external-gateway public neutron-bug
# neutron l3-agent-list-hosting-router neutron-bug
# (for all l3 agents): neutron l3-agent-router-remove L3_AGENT_ID neutron-bug
# (for a single l3 agent): neutron l3-agent-router-add L3_AGENT_ID neutron-bug
(GARPs are not sent)
# openstack router add port neutron-bug test-port
(GARPs are sent properly)

* Expected output:

Gratuitous ARPs should be sent from router's namespace during MASTER
transition.


* Actual output:

Gratuitous ARPs are not sent.
Keepalived complains about: Error 100 (Network is down) sending gratuitous ARP on qg-4a2f0239-5c for 172.29.249.194
qg-* interface wakes up about 1 second after keepalived tries to send GARPs.


* Attachments:

Keepalived logs: https://paste.openstack.org/raw/811372/
Interfaces inside router's netns + tcpdump from master transition: https://paste.openstack.org/raw/811373/

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1952907

Title:
  Gratuitous ARPs are not sent during master transition

Status in neutron:
  New

Bug description:
  * High level description:

  When a router transitions to MASTER state, keepalived should send GARPs but it fails because qg-* interface is down(it comes up about 1 sec after that, so it might be some race condition)
  Keepalived should also send another GARPs after 60 seconds(garp_master_delay) but it doesn't(probably because first ones fail, but I'm not 100% sure).

  When I add random port to this router to trigger keepalived's reload,
  then all GARPs are sent properly(because netns is already configured
  and qg-* interface is up for the whole time)

  
  * Pre-conditions:

  Operating System: Ubuntu 20.04
  Keepalived version: 2.0.19
  Affected neutron releases:
    - my AIO env: Xena (master/106fa3e6d3f0b1c32ef28fe9dd6b125b9317e9cf # HEAD as of 29.09.2021)
    - my prod env: Victoria
    - (most likely all versions after this change https://review.opendev.org/c/openstack/neutron/+/707406)

  
  * Step-by-step reproduction:

  Simply perform a failover on HA router.
  The same goal may be also achieved by removing all l3 agents from the router, and then adding one, so:

  # openstack router create neutron-bug
  # openstack router set --external-gateway public neutron-bug
  # neutron l3-agent-list-hosting-router neutron-bug
  # (for all l3 agents): neutron l3-agent-router-remove L3_AGENT_ID neutron-bug
  # (for a single l3 agent): neutron l3-agent-router-add L3_AGENT_ID neutron-bug
  (GARPs are not sent)
  # openstack router add port neutron-bug test-port
  (GARPs are sent properly)

  * Expected output:

  Gratuitous ARPs should be sent from router's namespace during MASTER
  transition.

  
  * Actual output:

  Gratuitous ARPs are not sent.
  Keepalived complains about: Error 100 (Network is down) sending gratuitous ARP on qg-4a2f0239-5c for 172.29.249.194
  qg-* interface wakes up about 1 second after keepalived tries to send GARPs.

  
  * Attachments:

  Keepalived logs: https://paste.openstack.org/raw/811372/
  Interfaces inside router's netns + tcpdump from master transition: https://paste.openstack.org/raw/811373/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1952907/+subscriptions



Follow ups