← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1654287] Re: functional test netns_cleanup failing in gate

 

Reviewed:  https://review.openstack.org/421325
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3f9f740d81b51be5e563069c720366fa90ade9ee
Submitter: Jenkins
Branch:    master

commit 3f9f740d81b51be5e563069c720366fa90ade9ee
Author: Daniel Alvarez <dalvarez@xxxxxxxxxx>
Date:   Thu Jan 12 01:06:01 2017 +0000

    Fix netns_cleanup interrupted on rwd I/O
    
    Functional tests for netns_cleanup have been failing a few times
    in the gate lately. After thorough tests we've seen that the issue was
    related to using rootwrap-daemon inside a wait_until_true loop. When
    timeout fired while utils.execute() was reading from rootwrap-daemon,
    it got interrupted and the output of the last command was not read.
    Therefore, next calls to utils.execute() would read the output of
    their previous command rather than their own, leading to unexpected
    results.
    
    This fix will poll existing processes in the namespace without making
    use of the wait_until_true loop. Instead, it will check elapsed time
    and raise the exception if timeout is exceeded.
    
    Also, i'm removing debug traces introduced in
    327f7fc4d54bbaaed3778b5eb3c51a037a9a178f which helped finding the root
    cause of this bug.
    
    Change-Id: Ie233261e4be36eecaf6ec6d0532f0f5e2e996cd2
    Closes-Bug: #1654287


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1654287

Title:
  functional test netns_cleanup failing in gate

Status in neutron:
  Fix Released
Status in oslo.rootwrap:
  New

Bug description:
  
  The functional test for netns_cleanup has failed in the gate today [0].

  Apparently, when trying to get the list of devices
  (ip_lib.get_devices() 'find /sys/class/net -maxdepth 1 -type 1 -printf
  %f') through rootwrap_daemon, it's getting the output of the previous
  command instead ('netstat -nlp'). This causes that the netns_cleanup
  module tries to unplug random devices which correspond to the actual
  output of the 'netstat' command.

  This bug doesn't look related to the test itself but to
  rootwrap_daemon? Maybe due to long output to the netstat command?

  
  Relevant part of the log

  2017-01-05 12:17:04.609 27615 DEBUG neutron.agent.linux.utils [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-cf2030c6-c924-45bb-b13b-6774d275b394', 'netstat', '-nlp'] execute_rootwrap_daemon neutron/agent/linux/utils.py:108
  2017-01-05 12:17:04.613 27615 DEBUG neutron.agent.linux.utils [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Exit code: 0 execute neutron/agent/linux/utils.py:149
  2017-01-05 12:17:04.614 27615 DEBUG neutron.agent.linux.utils [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-cf2030c6-c924-45bb-b13b-6774d275b394', 'find', '/sys/class/net', '-maxdepth', '1', '-type', 'l', '-printf', '%f '] execute_rootwrap_daemon neutron/agent/linux/utils.py:108
  2017-01-05 12:17:04.645 27615 DEBUG neutron.agent.ovsdb.native.vlog [-] [POLLIN] on fd 14 __log_wakeup /opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/ovs/poller.py:202
  2017-01-05 12:17:04.686 27615 DEBUG neutron.agent.linux.utils [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Exit code: 0 execute neutron/agent/linux/utils.py:149
  2017-01-05 12:17:04.688 27615 DEBUG neutron.agent.linux.utils [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-cf2030c6-c924-45bb-b13b-6774d275b394', 'ip', 'link', 'delete', 'Active'] execute_rootwrap_daemon neutron/agent/linux/utils.py:108
  2017-01-05 12:17:04.746 27615 DEBUG neutron.agent.linux.utils [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Exit code: 0 execute neutron/agent/linux/utils.py:149
  2017-01-05 12:17:04.747 27615 DEBUG neutron.agent.linux.utils [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-cf2030c6-c924-45bb-b13b-6774d275b394', 'ip', 'link', 'delete', 'Internet'] execute_rootwrap_daemon neutron/agent/linux/utils.py:108
  2017-01-05 12:17:04.758 27615 DEBUG neutron.agent.ovsdb.native.vlog [-] [POLLIN] on fd 14 __log_wakeup /opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/ovs/poller.py:202
  2017-01-05 12:17:04.815 27615 DEBUG neutron.agent.ovsdb.native.vlog [-] [POLLIN] on fd 14 __log_wakeup /opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/ovs/poller.py:202
  2017-01-05 12:17:04.822 27615 DEBUG neutron.agent.ovsdb.native.vlog [-] [POLLIN] on fd 7 __log_wakeup /opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/ovs/poller.py:202
  2017-01-05 12:17:04.822 27615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): InterfaceToBridgeCommand(name=Internet) do_commit neutron/agent/ovsdb/impl_idl.py:100
  2017-01-05 12:17:04.823 27615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction aborted do_commit neutron/agent/ovsdb/impl_idl.py:124
  2017-01-05 12:17:04.824 27615 DEBUG neutron.cmd.netns_cleanup [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Unable to find bridge for device: Internet unplug_device neutron/cmd/netns_cleanup.py:138
  2017-01-05 12:17:04.824 27615 DEBUG neutron.agent.linux.utils [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-cf2030c6-c924-45bb-b13b-6774d275b394', 'ip', 'link', 'delete', 'connections'] execute_rootwrap_daemon neutron/agent/linux/utils.py:108
  ....
  2017-01-05 12:17:06.388 27615 DEBUG neutron.cmd.netns_cleanup [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Unable to find bridge for device: Path unplug_device neutron/cmd/netns_cleanup.py:138
  2017-01-05 12:17:06.389 27615 DEBUG neutron.agent.linux.utils [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Running command (rootwrap daemon): ['ip', '-o', 'netns', 'list'] execute_rootwrap_daemon neutron/agent/linux/utils.py:108
  2017-01-05 12:17:06.454 27615 ERROR neutron.agent.linux.utils [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot find device "Path"

  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup [req-68eceb29-052a-4c8c-8152-38bbe636cba5 - - - - -] Error unable to destroy namespace: qrouter-cf2030c6-c924-45bb-b13b-6774d275b394
  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup Traceback (most recent call last):
  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup   File "neutron/cmd/netns_cleanup.py", line 250, in destroy_namespace
  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup     ip.garbage_collect_namespace()
  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup   File "neutron/agent/linux/ip_lib.py", line 222, in garbage_collect_namespace
  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup     if self.namespace and self.netns.exists(self.namespace):
  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup   File "neutron/agent/linux/ip_lib.py", line 888, in exists
  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup     run_as_root=cfg.CONF.AGENT.use_helper_for_ns_read)
  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup   File "neutron/agent/linux/ip_lib.py", line 107, in _execute
  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup     log_fail_as_error=log_fail_as_error)
  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup   File "neutron/agent/linux/utils.py", line 147, in execute
  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup     raise ProcessExecutionError(msg, returncode=returncode)
  2017-01-05 12:17:06.454 27615 ERROR neutron.cmd.netns_cleanup ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot find device "Path"


  [0] http://logs.openstack.org/51/396651/15/check/gate-neutron-dsvm-
  functional-ubuntu-xenial/5d268f0/logs/dsvm-functional-
  logs/neutron.tests.functional.cmd.test_netns_cleanup.NetnsCleanupTest.test_cleanup_network_namespaces_cleans_dhcp_and_l3_namespaces.txt.gz

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1654287/+subscriptions


References