yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #51617
[Bug 1567668] Re: Functional job sometimes hits global 2 hour limit and fails
Reviewed: https://review.openstack.org/317369
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c13d722f3913137945c27fcc74371d3316129f30
Submitter: Jenkins
Branch: master
commit c13d722f3913137945c27fcc74371d3316129f30
Author: Jakub Libosvar <libosvar@xxxxxxxxxx>
Date: Mon May 16 18:07:32 2016 +0000
ovsdb: Don't let block() wait indefinitely
Poller.block() calls select() on defined file descriptors. This patch
adds timeout to select() so it doesn't get stuck in case no fd is ready.
Also timeout was added when reading transaction results from queue.
Closes-Bug: 1567668
Change-Id: I7dbddd01409430ce93d8c23f04f02c46fb2a68c4
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1567668
Title:
Functional job sometimes hits global 2 hour limit and fails
Status in neutron:
Fix Released
Bug description:
Here's an example:
http://logs.openstack.org/13/302913/1/check/gate-neutron-dsvm-functional/91dd537/console.html
Logstash query:
build_name:"gate-neutron-dsvm-functional" AND build_status:"FAILURE" AND message:"Killed timeout -s 9"
45 hits in the last 7 days.
Ihar and I checked the timing, and it started happening as we merged:
https://review.openstack.org/#/c/298056/ (EDIT: After some
investigating, this doesn't look like the root cause).
There's a few problems here:
1) It appears like a test is freezing up. We have a per-test timeout defined. The timeout is defined by OS_TEST_TIMEOUT in tox.ini, and is enforced via a fixtures.Timeout fixture set up in the oslotest base class. It looks like that timeout doesn't always work.
2) When the global 2 hours job timeout is hit, it doesn't perform post-tests tasks such as copying over log files, which makes these problems a lot harder to troubleshoot.
3) And of course, there is some sort of issue with likely https://review.openstack.org/#/c/298056/.
We can fix via a revert, which will increase the failure rate of
fullstack. Since I've been unable to reproduce this issue locally, I'd
like to hold off on a revert and try to get some more information by
tackling some combination of problems 1 and 2, and then adding more
logging to figure it out.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1567668/+subscriptions
References