← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1891190] [NEW] test_reload() functional test causes hang and jobs TIMED_OUT

 

Public bug reported:

The glance.tests.functional.test_reload.TestReload.test_reload() test
has been causing spurious deadlocks in functional test jobs, resulting
in TIMED_OUT job statuses due to the global timeout expiring. This can
be reproduced locally with lots of exposure, but Zuul runs things enough
to hit it fairly often.

I have tracked this down to the test_reload() test, which if I reproduce
this locally, I find it is in an infinite waitpid() on the API master
process that the FunctionalTest base class has started for it. The test
tracks child PIDs of the master as it initiates several SIGHUP
operations. Upon exit, the FunctionalTest.cleanup() routine runs and
ends up waitpid()ing on the master process forever. A process list shows
all the other stestr workers in Z state waiting for the final worker to
complete. The final worker being stuck on waitpid() has the master
process and both worker processes still running. Upon killing the
master, stestr frees up, reports status from the test and exits
normally.

Stack trace of the hung test process after signaling the master it is
waiting for manually is:

Traceback (most recent call last):
  File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/stestr/subunit_runner/run.py", line 93, in <module>
    main()
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/stestr/subunit_runner/run.py", line 89, in main
    testRunner=partial(runner, stdout=sys.stdout))
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/stestr/subunit_runner/program.py", line 185, in __init__
    self.runTests()
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/stestr/subunit_runner/program.py", line 226, in runTests
    self.result = testRunner.run(self.test)
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/stestr/subunit_runner/run.py", line 52, in run
    test(result)
  File "/usr/lib64/python3.7/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib64/python3.7/unittest/suite.py", line 122, in run
    test(result)
  File "/usr/lib64/python3.7/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib64/python3.7/unittest/suite.py", line 122, in run
    test(result)
  File "/usr/lib64/python3.7/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib64/python3.7/unittest/suite.py", line 122, in run
    test(result)
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/unittest2/case.py", line 673, in __call__
    return self.run(*args, **kwds)
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/testtools/testcase.py", line 675, in run
    return run_test.run(result)
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/testtools/runtest.py", line 80, in run
    return self._run_one(actual_result)
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/testtools/runtest.py", line 94, in _run_one
    return self._run_prepared_result(ExtendedToOriginalDecorator(result))
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/testtools/runtest.py", line 119, in _run_prepared_result
    raise e
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/testtools/runtest.py", line 191, in _run_user
    return fn(*args, **kwargs)
  File "/home/dan/glance/glance/tests/functional/__init__.py", line 881, in cleanup
    s.stop()
  File "/home/dan/glance/glance/tests/functional/__init__.py", line 293, in stop
    rc = test_utils.wait_for_fork(self.process_pid, raise_error=False)
  File "/home/dan/glance/glance/tests/utils.py", line 294, in wait_for_fork
    (pid, rc) = os.waitpid(pid, 0)
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/eventlet/green/os.py", line 96, in waitpid
    greenthread.sleep(0.01)
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/eventlet/greenthread.py", line 36, in sleep
    hub.switch()
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/eventlet/hubs/hub.py", line 298, in switch
    return self.greenlet.switch()
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/eventlet/hubs/hub.py", line 350, in run
    self.wait(sleep_time)
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/eventlet/hubs/poll.py", line 80, in wait
    presult = self.do_poll(seconds)
  File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/eventlet/hubs/epolls.py", line 31, in do_poll
    return self.poll.poll(seconds)

** Affects: glance
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1891190

Title:
  test_reload() functional test causes hang and jobs TIMED_OUT

Status in Glance:
  New

Bug description:
  The glance.tests.functional.test_reload.TestReload.test_reload() test
  has been causing spurious deadlocks in functional test jobs, resulting
  in TIMED_OUT job statuses due to the global timeout expiring. This can
  be reproduced locally with lots of exposure, but Zuul runs things
  enough to hit it fairly often.

  I have tracked this down to the test_reload() test, which if I
  reproduce this locally, I find it is in an infinite waitpid() on the
  API master process that the FunctionalTest base class has started for
  it. The test tracks child PIDs of the master as it initiates several
  SIGHUP operations. Upon exit, the FunctionalTest.cleanup() routine
  runs and ends up waitpid()ing on the master process forever. A process
  list shows all the other stestr workers in Z state waiting for the
  final worker to complete. The final worker being stuck on waitpid()
  has the master process and both worker processes still running. Upon
  killing the master, stestr frees up, reports status from the test and
  exits normally.

  Stack trace of the hung test process after signaling the master it is
  waiting for manually is:

  Traceback (most recent call last):
    File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
      "__main__", mod_spec)
    File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
      exec(code, run_globals)
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/stestr/subunit_runner/run.py", line 93, in <module>
      main()
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/stestr/subunit_runner/run.py", line 89, in main
      testRunner=partial(runner, stdout=sys.stdout))
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/stestr/subunit_runner/program.py", line 185, in __init__
      self.runTests()
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/stestr/subunit_runner/program.py", line 226, in runTests
      self.result = testRunner.run(self.test)
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/stestr/subunit_runner/run.py", line 52, in run
      test(result)
    File "/usr/lib64/python3.7/unittest/suite.py", line 84, in __call__
      return self.run(*args, **kwds)
    File "/usr/lib64/python3.7/unittest/suite.py", line 122, in run
      test(result)
    File "/usr/lib64/python3.7/unittest/suite.py", line 84, in __call__
      return self.run(*args, **kwds)
    File "/usr/lib64/python3.7/unittest/suite.py", line 122, in run
      test(result)
    File "/usr/lib64/python3.7/unittest/suite.py", line 84, in __call__
      return self.run(*args, **kwds)
    File "/usr/lib64/python3.7/unittest/suite.py", line 122, in run
      test(result)
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/unittest2/case.py", line 673, in __call__
      return self.run(*args, **kwds)
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/testtools/testcase.py", line 675, in run
      return run_test.run(result)
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/testtools/runtest.py", line 80, in run
      return self._run_one(actual_result)
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/testtools/runtest.py", line 94, in _run_one
      return self._run_prepared_result(ExtendedToOriginalDecorator(result))
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/testtools/runtest.py", line 119, in _run_prepared_result
      raise e
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/testtools/runtest.py", line 191, in _run_user
      return fn(*args, **kwargs)
    File "/home/dan/glance/glance/tests/functional/__init__.py", line 881, in cleanup
      s.stop()
    File "/home/dan/glance/glance/tests/functional/__init__.py", line 293, in stop
      rc = test_utils.wait_for_fork(self.process_pid, raise_error=False)
    File "/home/dan/glance/glance/tests/utils.py", line 294, in wait_for_fork
      (pid, rc) = os.waitpid(pid, 0)
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/eventlet/green/os.py", line 96, in waitpid
      greenthread.sleep(0.01)
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/eventlet/greenthread.py", line 36, in sleep
      hub.switch()
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/eventlet/hubs/hub.py", line 298, in switch
      return self.greenlet.switch()
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/eventlet/hubs/hub.py", line 350, in run
      self.wait(sleep_time)
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/eventlet/hubs/poll.py", line 80, in wait
      presult = self.do_poll(seconds)
    File "/home/dan/glance/.tox/functional/lib/python3.7/site-packages/eventlet/hubs/epolls.py", line 31, in do_poll
      return self.poll.poll(seconds)

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1891190/+subscriptions


Follow ups