← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1530070] [NEW] Neutron Netns Cleanup script fails to delete namespaces after reboot

 

Public bug reported:

After rebooting a node which held an active VRRP router, DHCP , and metadata agent, the neutron-netns-cleanup utility failed to delete stale namespaces. 
The utility fails with :

seting the network namespace "qrouter-3d4e5634-59f0-401e-
9f28-6c8daaec311c" failed: Invalid argument

The reason is a bug in iproute which fails to do any operation on a stale namespaces which appear in /var/run/netns like this:
root@stratonode66 ~# ls -l /var/run/netns/ 
total 0
rrr- 1 root root 0 Dec 24 13:38 qdhcp-0a348422-97e2-4ab6-bb22-55994a125823
rrr- 1 root root 0 Dec 24 11:54 qdhcp-2258aa3f-d256-4c9f-9e48-16811fc57981
rrr- 1 root root 0 Dec 24 13:38 qdhcp-3ceb1f27-e3fc-413a-a184-567041f073e2
rrr- 1 root root 0 Dec 24 11:54 qdhcp-62a51b66-d0e2-42fc-bdf2-2d622a889e75
rrr- 1 root root 0 Dec 24 11:54 qdhcp-81b550a2-c483-4280-a83a-b560ecdc416b
---------- 1 root root 0 Dec 23 13:54 qrouter-3d4e5634-59f0-401e-9f28-6c8daaec311c
---------- 1 root root 0 Dec 24 11:25 qrouter-69d20923-da78-4c6b-bb24-967dd67acb1d
---------- 1 root root 0 Dec 23 13:54 qrouter-cc649801-96ec-4d59-90de-1004fc026024

This bug s related, but doesn't solve the issue after reboot:
https://bugs.launchpad.net/neutron/+bug/1052535.

I solved it by fixing the neutron-netns-cleanup --force code, with this
patch:

diff --git a/neutron/agent/netns_cleanup_util.py b/neutron/agent/netns_cleanup_util.py
index 771a77f..3c43480 100644
--- a/neutron/agent/netns_cleanup_util.py
+++ b/neutron/agent/netns_cleanup_util.py
@@ -132,8 +132,13 @@ def destroy_namespace(conf, namespace, force=False):
             # NOTE: The dhcp driver will remove the namespace if is it empty,
             # so a second check is required here.
             if ip.netns.exists(namespace):
-                for device in ip.get_devices(exclude_loopback=True):
-                    unplug_device(conf, device)
+                try:
+                    for device in ip.get_devices(exclude_loopback=True):
+                        unplug_device(conf, device)
+                except RuntimeError:
+                    LOG.info(_('Keep calm, and destroy namespace: %s'), namespace)
+                    ip.netns.delete(namespace)
+        return
 
         ip.garbage_collect_namespace()
     except Exception:

When I run the following after reboot, the name spaces are cleaned-up
and when starting neutron-openvswitch-agent.service neutron-dhcp-
agent.service neutron-l3-agent.service they are recreated.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1530070

Title:
  Neutron Netns Cleanup script fails to delete namespaces after reboot

Status in neutron:
  New

Bug description:
  After rebooting a node which held an active VRRP router, DHCP , and metadata agent, the neutron-netns-cleanup utility failed to delete stale namespaces. 
  The utility fails with :

  seting the network namespace "qrouter-3d4e5634-59f0-401e-
  9f28-6c8daaec311c" failed: Invalid argument

  The reason is a bug in iproute which fails to do any operation on a stale namespaces which appear in /var/run/netns like this:
  root@stratonode66 ~# ls -l /var/run/netns/ 
  total 0
  rrr- 1 root root 0 Dec 24 13:38 qdhcp-0a348422-97e2-4ab6-bb22-55994a125823
  rrr- 1 root root 0 Dec 24 11:54 qdhcp-2258aa3f-d256-4c9f-9e48-16811fc57981
  rrr- 1 root root 0 Dec 24 13:38 qdhcp-3ceb1f27-e3fc-413a-a184-567041f073e2
  rrr- 1 root root 0 Dec 24 11:54 qdhcp-62a51b66-d0e2-42fc-bdf2-2d622a889e75
  rrr- 1 root root 0 Dec 24 11:54 qdhcp-81b550a2-c483-4280-a83a-b560ecdc416b
  ---------- 1 root root 0 Dec 23 13:54 qrouter-3d4e5634-59f0-401e-9f28-6c8daaec311c
  ---------- 1 root root 0 Dec 24 11:25 qrouter-69d20923-da78-4c6b-bb24-967dd67acb1d
  ---------- 1 root root 0 Dec 23 13:54 qrouter-cc649801-96ec-4d59-90de-1004fc026024

  This bug s related, but doesn't solve the issue after reboot:
  https://bugs.launchpad.net/neutron/+bug/1052535.

  I solved it by fixing the neutron-netns-cleanup --force code, with
  this patch:

  diff --git a/neutron/agent/netns_cleanup_util.py b/neutron/agent/netns_cleanup_util.py
  index 771a77f..3c43480 100644
  --- a/neutron/agent/netns_cleanup_util.py
  +++ b/neutron/agent/netns_cleanup_util.py
  @@ -132,8 +132,13 @@ def destroy_namespace(conf, namespace, force=False):
               # NOTE: The dhcp driver will remove the namespace if is it empty,
               # so a second check is required here.
               if ip.netns.exists(namespace):
  -                for device in ip.get_devices(exclude_loopback=True):
  -                    unplug_device(conf, device)
  +                try:
  +                    for device in ip.get_devices(exclude_loopback=True):
  +                        unplug_device(conf, device)
  +                except RuntimeError:
  +                    LOG.info(_('Keep calm, and destroy namespace: %s'), namespace)
  +                    ip.netns.delete(namespace)
  +        return
   
           ip.garbage_collect_namespace()
       except Exception:

  When I run the following after reboot, the name spaces are cleaned-up
  and when starting neutron-openvswitch-agent.service neutron-dhcp-
  agent.service neutron-l3-agent.service they are recreated.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1530070/+subscriptions


Follow ups