← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2025129] [NEW] DvrLocalRouter init references namespace before it is created

 

Public bug reported:

Description
-----------

When the DvrLocalRouter object is instantiated, it calls the the
_load_used_fip_information() function. In some cases this function will
try to add ip rules in a specific network namespace however that
namespace may not exist at the time. This results in
neutron.privileged.agent.linux.ip_lib.NetworkNamespaceNotFound being
thrown.


Pre-conditions
--------------

- DVR is in use and the created router is distributed and HA
- The state file 'fip-priorities' is missing some entires which results in https://opendev.org/openstack/neutron/src/commit/0c5d4b872899497437d1399c845be756103a46d3/neutron/agent/l3/dvr_local_router.py#L76 being skipped
- The qrouter network namespace does not exist (possibly due to a reboot of the host or something similar)


Step-by-step reproduction steps
-------------------------------

- Setup OpenStack with DVR enabled
- Create a HA router with an external subnet attached so we can use the IPs as FIPs
- Create a VM with a FIP attached from the aforementioned router
- SSH to the host running the aforementioned VM and:
  - Delete the qrouter namespace associated with this router
  - Remove the entry for the FIP from the fip-priorities state file in the Neutron state directory
  - Restart the Neutron L3 agent


Expected output
---------------

Neutron L3 agent should restart without any errors.


Actual output
-------------

Neutron L3 agent throws a NetworkNamespaceNotFound exception for each
missing FIP in the fip-priorities state file, fails to setup the router
and then retries. Note that if there are more than 5 missing FIP entires
in the fip-priorities file then the router setup fails completely as it
hits the retry limit specified in
https://opendev.org/openstack/neutron/src/commit/0c5d4b872899497437d1399c845be756103a46d3/neutron/agent/l3/agent.py#L730-L733.
This leaves the router completely broken and not setup on the node
resulting in broken networking for all VMs using that router on a
particular host.


Version
-------
- OpenStack version - master/zed
- Linux distro - AlmaLinux9
- Deployed via Kolla Ansible

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2025129

Title:
  DvrLocalRouter init references namespace before it is created

Status in neutron:
  New

Bug description:
  Description
  -----------

  When the DvrLocalRouter object is instantiated, it calls the the
  _load_used_fip_information() function. In some cases this function
  will try to add ip rules in a specific network namespace however that
  namespace may not exist at the time. This results in
  neutron.privileged.agent.linux.ip_lib.NetworkNamespaceNotFound being
  thrown.

  
  Pre-conditions
  --------------

  - DVR is in use and the created router is distributed and HA
  - The state file 'fip-priorities' is missing some entires which results in https://opendev.org/openstack/neutron/src/commit/0c5d4b872899497437d1399c845be756103a46d3/neutron/agent/l3/dvr_local_router.py#L76 being skipped
  - The qrouter network namespace does not exist (possibly due to a reboot of the host or something similar)

  
  Step-by-step reproduction steps
  -------------------------------

  - Setup OpenStack with DVR enabled
  - Create a HA router with an external subnet attached so we can use the IPs as FIPs
  - Create a VM with a FIP attached from the aforementioned router
  - SSH to the host running the aforementioned VM and:
    - Delete the qrouter namespace associated with this router
    - Remove the entry for the FIP from the fip-priorities state file in the Neutron state directory
    - Restart the Neutron L3 agent

  
  Expected output
  ---------------

  Neutron L3 agent should restart without any errors.

  
  Actual output
  -------------

  Neutron L3 agent throws a NetworkNamespaceNotFound exception for each
  missing FIP in the fip-priorities state file, fails to setup the
  router and then retries. Note that if there are more than 5 missing
  FIP entires in the fip-priorities file then the router setup fails
  completely as it hits the retry limit specified in
  https://opendev.org/openstack/neutron/src/commit/0c5d4b872899497437d1399c845be756103a46d3/neutron/agent/l3/agent.py#L730-L733.
  This leaves the router completely broken and not setup on the node
  resulting in broken networking for all VMs using that router on a
  particular host.

  
  Version
  -------
  - OpenStack version - master/zed
  - Linux distro - AlmaLinux9
  - Deployed via Kolla Ansible

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2025129/+subscriptions



Follow ups