yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #27023
[Bug 1412742] [NEW] tests randomly hang in oslo.messaging fake driver
Public bug reported:
The following change has been getting random failures in the py27 test
job where it would time out after 50 minutes.
https://review.openstack.org/#/c/140725/
eg this test run:
http://logs.openstack.org/25/140725/11/gate/gate-nova-python27/0dabb50/
Strangely other stuff going into the gate was not affected, and there's
no obvious sign of what in this change would cause such a hang probem.
Running all the changed tests in isolation never showed a hang. Only
when the entire nova test suite was run would it hang, so it seems like
some kind of race condition is hiding there. Probably the fact that this
change deletes some tests in one file and adds tests in another file
makes the race more likely to occurr.
Reproducing the hang locally and then attaching to the procss with GDB
shows the following trace that ends the in oslo.messaging fake driver
(gdb) py-bt
#3 Frame 0x8b71dd0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_fake.py, line 149, in get_exchange (self=<FakeExchangeManager(_exchanges_lock=<Semaphore(counter=0, _waiters=set([])) at remote 0x954c1d0>, _exchanges={'nova': <FakeExchange(_server_queues={('fake', 'foo'): []}, _topic_queues={('fake', None): []}, _queues_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<Semaphore(counter=1, _waiters=set([])) at remote 0x7667f10>, _RLock__count=0) at remote 0x7667a10>, name='nova') at remote 0x7667910>}, _default_exchange='nova') at remote 0x954c850>, name='nova')
return self._exchanges.setdefault(name, FakeExchange(name))
#6 Frame 0xf2281d0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_fake.py, line 63, in poll (self=<FakeListener(_pool=None, _targets=[<Target(version=None, exchange='nova', namespace=None, server='foo', topic='fake', fanout=None) at remote 0x7667cd0>, <Target(version=None, exchange='nova', namespace=None, server=None, topic='fake', fanout=None) at remote 0x7667790>], driver=<FakeDriver(_allowed_remote_exmods=['nova.exception', 'nova.test'], _exchange_manager=<FakeExchangeManager(_exchanges_lock=<Semaphore(counter=0, _waiters=set([])) at remote 0x954c1d0>, _exchanges={'nova': <FakeExchange(_server_queues={('fake', 'foo'): []}, _topic_queues={('fake', None): []}, _queues_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<Semaphore(counter=1, _waiters=set([])) at remote 0x7667f10>, _RLock__count=0) at remote 0x7667a10>, name='nova') at remote 0x7667910>}, _default_exchange='nova') at remote 0x954c850>, _url=<TransportURL(hosts=[], _tr...(truncated)
exchange = self._exchange_manager.get_exchange(target.exchange)
#10 Frame 0x710e9d0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/oslo/messaging/_executors/impl_eventlet.py, line 88, in _executor_thread (incoming=None)
incoming = self.listener.poll(timeout=base.POLL_TIMEOUT)
#15 Frame 0x8204da0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/oslo_utils/excutils.py, line 92, in inner_func (args=(), kwargs={}, last_log_time=1421749084, last_exc_message=u'', exc_count=0, exc=<TimeoutException at remote 0x10824a50>, this_exc_message=u'', cur_time=1421749084)
return infunc(*args, **kwargs)
#20 Frame 0x102669a0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/eventlet/greenthread.py, line 214, in main (self=<GreenThread(_exit_funcs=<collections.deque at remote 0x10608050>, _exit_event=<Event(_result=<NOT_USED() at remote 0x23f6c20>, _waiters=set([]), _exc=None) at remote 0x7667150>, _resolving_links=False) at remote 0xe1d6eb0>, function=<function at remote 0xe47f8c0>, args=(), kwargs={})
result = function(*args, **kwargs)
#31 Frame 0x84deca0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/eventlet/hubs/timer.py, line 58, in __call__ (self=<Timer(seconds=<float at remote 0xbf93860>, tpl=(<built-in method switch of GreenThread object at remote 0xe1d6eb0>, (), {}), called=True) at remote 0xcedeb90>, args=(...), cb=<built-in method switch of GreenThread object at remote 0xe1d6eb0>, kw={...})
cb(*args, **kw)
#41 Frame 0xf1e5ff0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/eventlet/hubs/hub.py, line 457, in fire_timers (self=<Hub(next_timers=[], clock=<built-in function time>, debug_exceptions=True, debug_blocking_resolution=1, modify=<built-in method modify of select.epoll object at remote 0xe80eca8>, running=True, debug_blocking=False, listeners={'read': {11: <FdListener(fileno=11, cb=<built-in method switch of GreenThread object at remote 0xe1d6c30>, spent=False, greenlet=<GreenThread(_exit_funcs=<collections.deque at remote 0xe1a5210>, _exit_event=<Event(_result=<NOT_USED() at remote 0x23f6c20>, _waiters=set([]), _exc=None) at remote 0x10938950>, _resolving_links=False) at remote 0xe1d6c30>, evtype='read', mark_as_closed=<instancemethod at remote 0xdde30a0>, tb=<built-in method throw of GreenThread object at remote 0xe1d6c30>) at remote 0x10938cd0>, 13: <FdListener(fileno=13, cb=<built-in method switch of GreenThread object at remote 0xef42c30>, spent=False, greenlet=<GreenThread(_...(truncated)
timer()
#44 Frame 0xd7d4970, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/eventlet/hubs/hub.py, line 336, in run (self=<Hub(next_timers=[], clock=<built-in function time>, debug_exceptions=True, debug_blocking_resolution=1, modify=<built-in method modify of select.epoll object at remote 0xe80eca8>, running=True, debug_blocking=False, listeners={'read': {11: <FdListener(fileno=11, cb=<built-in method switch of GreenThread object at remote 0xe1d6c30>, spent=False, greenlet=<GreenThread(_exit_funcs=<collections.deque at remote 0xe1a5210>, _exit_event=<Event(_result=<NOT_USED() at remote 0x23f6c20>, _waiters=set([]), _exc=None) at remote 0x10938950>, _resolving_links=False) at remote 0xe1d6c30>, evtype='read', mark_as_closed=<instancemethod at remote 0xdde30a0>, tb=<built-in method throw of GreenThread object at remote 0xe1d6c30>) at remote 0x10938cd0>, 13: <FdListener(fileno=13, cb=<built-in method switch of GreenThread object at remote 0xef42c30>, spent=False, greenlet=<GreenThread(_exit_fun...(truncated)
self.fire_timers(self.clock())
Upon Ctrl-C'ing the hung test suite it shows it was trying to run this test case at the time:
^C{1} nova.tests.unit.virt.libvirt.test_volume.LibvirtVolumeTestCase.test_libvirt_kvm_iser_volume_with_multipath_getmpdev [] ... inprogress
Traceback (most recent call last):
File "/home/berrange/src/cloud/nova/.tox/py27/bin/subunit-trace", line 11, in <module>
sys.exit(main())
File "/home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/tempest_lib/cmd/subunit_trace.py", line 257, in main
stream.run(result)
File "/home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/subunit/v2.py", line 270, in run
content = self.source.read(1)
KeyboardInterrupt
ERROR: KEYBOARDINTERRUPT
interrupted
ERROR: keyboardinterrupt
Again, running that test case in isolation never fails.
Seeing this recent commit
commit 2809fab96c4b48f2a979965b3f4788dfb1379977
Author: Sean Dague <sean@xxxxxxxxx>
Date: Fri Jan 16 14:41:24 2015 -0500
increase fake rpc POLL_TIMEOUT to 0.1s
The 0.001s POLL_TIMEOUT appeared related to an intermitent race
issue. The polling loop in the fake driver in oslo.messaging is 0.05s,
so increase to 0.1 total time out to make sure we get at least one
full polling cycle on waits.
Change-Id: I24a83638df0c8f3f84bb528333a91491a98016a3
suggests to me that the problem could be related to the fact that the
nova.tests.unit.virt.libvirt.test_volume.LibvirtVolumeTestCase.test_libvirt_kvm_iser_volume_with_multipath_getmpdev
test stubs out time.sleep to be a no-op
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1412742
Title:
tests randomly hang in oslo.messaging fake driver
Status in OpenStack Compute (Nova):
New
Bug description:
The following change has been getting random failures in the py27 test
job where it would time out after 50 minutes.
https://review.openstack.org/#/c/140725/
eg this test run:
http://logs.openstack.org/25/140725/11/gate/gate-nova-
python27/0dabb50/
Strangely other stuff going into the gate was not affected, and
there's no obvious sign of what in this change would cause such a hang
probem.
Running all the changed tests in isolation never showed a hang. Only
when the entire nova test suite was run would it hang, so it seems
like some kind of race condition is hiding there. Probably the fact
that this change deletes some tests in one file and adds tests in
another file makes the race more likely to occurr.
Reproducing the hang locally and then attaching to the procss with GDB
shows the following trace that ends the in oslo.messaging fake driver
(gdb) py-bt
#3 Frame 0x8b71dd0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_fake.py, line 149, in get_exchange (self=<FakeExchangeManager(_exchanges_lock=<Semaphore(counter=0, _waiters=set([])) at remote 0x954c1d0>, _exchanges={'nova': <FakeExchange(_server_queues={('fake', 'foo'): []}, _topic_queues={('fake', None): []}, _queues_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<Semaphore(counter=1, _waiters=set([])) at remote 0x7667f10>, _RLock__count=0) at remote 0x7667a10>, name='nova') at remote 0x7667910>}, _default_exchange='nova') at remote 0x954c850>, name='nova')
return self._exchanges.setdefault(name, FakeExchange(name))
#6 Frame 0xf2281d0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_fake.py, line 63, in poll (self=<FakeListener(_pool=None, _targets=[<Target(version=None, exchange='nova', namespace=None, server='foo', topic='fake', fanout=None) at remote 0x7667cd0>, <Target(version=None, exchange='nova', namespace=None, server=None, topic='fake', fanout=None) at remote 0x7667790>], driver=<FakeDriver(_allowed_remote_exmods=['nova.exception', 'nova.test'], _exchange_manager=<FakeExchangeManager(_exchanges_lock=<Semaphore(counter=0, _waiters=set([])) at remote 0x954c1d0>, _exchanges={'nova': <FakeExchange(_server_queues={('fake', 'foo'): []}, _topic_queues={('fake', None): []}, _queues_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<Semaphore(counter=1, _waiters=set([])) at remote 0x7667f10>, _RLock__count=0) at remote 0x7667a10>, name='nova') at remote 0x7667910>}, _default_exchange='nova') at remote 0x954c850>, _url=<TransportURL(hosts=[], _tr...(truncated)
exchange = self._exchange_manager.get_exchange(target.exchange)
#10 Frame 0x710e9d0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/oslo/messaging/_executors/impl_eventlet.py, line 88, in _executor_thread (incoming=None)
incoming = self.listener.poll(timeout=base.POLL_TIMEOUT)
#15 Frame 0x8204da0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/oslo_utils/excutils.py, line 92, in inner_func (args=(), kwargs={}, last_log_time=1421749084, last_exc_message=u'', exc_count=0, exc=<TimeoutException at remote 0x10824a50>, this_exc_message=u'', cur_time=1421749084)
return infunc(*args, **kwargs)
#20 Frame 0x102669a0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/eventlet/greenthread.py, line 214, in main (self=<GreenThread(_exit_funcs=<collections.deque at remote 0x10608050>, _exit_event=<Event(_result=<NOT_USED() at remote 0x23f6c20>, _waiters=set([]), _exc=None) at remote 0x7667150>, _resolving_links=False) at remote 0xe1d6eb0>, function=<function at remote 0xe47f8c0>, args=(), kwargs={})
result = function(*args, **kwargs)
#31 Frame 0x84deca0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/eventlet/hubs/timer.py, line 58, in __call__ (self=<Timer(seconds=<float at remote 0xbf93860>, tpl=(<built-in method switch of GreenThread object at remote 0xe1d6eb0>, (), {}), called=True) at remote 0xcedeb90>, args=(...), cb=<built-in method switch of GreenThread object at remote 0xe1d6eb0>, kw={...})
cb(*args, **kw)
#41 Frame 0xf1e5ff0, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/eventlet/hubs/hub.py, line 457, in fire_timers (self=<Hub(next_timers=[], clock=<built-in function time>, debug_exceptions=True, debug_blocking_resolution=1, modify=<built-in method modify of select.epoll object at remote 0xe80eca8>, running=True, debug_blocking=False, listeners={'read': {11: <FdListener(fileno=11, cb=<built-in method switch of GreenThread object at remote 0xe1d6c30>, spent=False, greenlet=<GreenThread(_exit_funcs=<collections.deque at remote 0xe1a5210>, _exit_event=<Event(_result=<NOT_USED() at remote 0x23f6c20>, _waiters=set([]), _exc=None) at remote 0x10938950>, _resolving_links=False) at remote 0xe1d6c30>, evtype='read', mark_as_closed=<instancemethod at remote 0xdde30a0>, tb=<built-in method throw of GreenThread object at remote 0xe1d6c30>) at remote 0x10938cd0>, 13: <FdListener(fileno=13, cb=<built-in method switch of GreenThread object at remote 0xef42c30>, spent=False, greenlet=<GreenThread(_...(truncated)
timer()
#44 Frame 0xd7d4970, for file /home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/eventlet/hubs/hub.py, line 336, in run (self=<Hub(next_timers=[], clock=<built-in function time>, debug_exceptions=True, debug_blocking_resolution=1, modify=<built-in method modify of select.epoll object at remote 0xe80eca8>, running=True, debug_blocking=False, listeners={'read': {11: <FdListener(fileno=11, cb=<built-in method switch of GreenThread object at remote 0xe1d6c30>, spent=False, greenlet=<GreenThread(_exit_funcs=<collections.deque at remote 0xe1a5210>, _exit_event=<Event(_result=<NOT_USED() at remote 0x23f6c20>, _waiters=set([]), _exc=None) at remote 0x10938950>, _resolving_links=False) at remote 0xe1d6c30>, evtype='read', mark_as_closed=<instancemethod at remote 0xdde30a0>, tb=<built-in method throw of GreenThread object at remote 0xe1d6c30>) at remote 0x10938cd0>, 13: <FdListener(fileno=13, cb=<built-in method switch of GreenThread object at remote 0xef42c30>, spent=False, greenlet=<GreenThread(_exit_fun...(truncated)
self.fire_timers(self.clock())
Upon Ctrl-C'ing the hung test suite it shows it was trying to run this test case at the time:
^C{1} nova.tests.unit.virt.libvirt.test_volume.LibvirtVolumeTestCase.test_libvirt_kvm_iser_volume_with_multipath_getmpdev [] ... inprogress
Traceback (most recent call last):
File "/home/berrange/src/cloud/nova/.tox/py27/bin/subunit-trace", line 11, in <module>
sys.exit(main())
File "/home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/tempest_lib/cmd/subunit_trace.py", line 257, in main
stream.run(result)
File "/home/berrange/src/cloud/nova/.tox/py27/lib/python2.7/site-packages/subunit/v2.py", line 270, in run
content = self.source.read(1)
KeyboardInterrupt
ERROR: KEYBOARDINTERRUPT
interrupted
ERROR: keyboardinterrupt
Again, running that test case in isolation never fails.
Seeing this recent commit
commit 2809fab96c4b48f2a979965b3f4788dfb1379977
Author: Sean Dague <sean@xxxxxxxxx>
Date: Fri Jan 16 14:41:24 2015 -0500
increase fake rpc POLL_TIMEOUT to 0.1s
The 0.001s POLL_TIMEOUT appeared related to an intermitent race
issue. The polling loop in the fake driver in oslo.messaging is 0.05s,
so increase to 0.1 total time out to make sure we get at least one
full polling cycle on waits.
Change-Id: I24a83638df0c8f3f84bb528333a91491a98016a3
suggests to me that the problem could be related to the fact that the
nova.tests.unit.virt.libvirt.test_volume.LibvirtVolumeTestCase.test_libvirt_kvm_iser_volume_with_multipath_getmpdev
test stubs out time.sleep to be a no-op
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1412742/+subscriptions
Follow ups
References