← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1567668] Re: Functional job sometimes hits global 2 hour limit and fails

 

Reviewed:  https://review.openstack.org/317369
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c13d722f3913137945c27fcc74371d3316129f30
Submitter: Jenkins
Branch:    master

commit c13d722f3913137945c27fcc74371d3316129f30
Author: Jakub Libosvar <libosvar@xxxxxxxxxx>
Date:   Mon May 16 18:07:32 2016 +0000

    ovsdb: Don't let block() wait indefinitely
    
    Poller.block() calls select() on defined file descriptors. This patch
    adds timeout to select() so it doesn't get stuck in case no fd is ready.
    
    Also timeout was added when reading transaction results from queue.
    
    Closes-Bug: 1567668
    Change-Id: I7dbddd01409430ce93d8c23f04f02c46fb2a68c4


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1567668

Title:
  Functional job sometimes hits global 2 hour limit and fails

Status in neutron:
  Fix Released

Bug description:
  Here's an example:
  http://logs.openstack.org/13/302913/1/check/gate-neutron-dsvm-functional/91dd537/console.html

  Logstash query:
  build_name:"gate-neutron-dsvm-functional" AND build_status:"FAILURE" AND message:"Killed                  timeout -s 9"

  45 hits in the last 7 days.

  Ihar and I checked the timing, and it started happening as we merged:
  https://review.openstack.org/#/c/298056/ (EDIT: After some
  investigating, this doesn't look like the root cause).

  There's a few problems here:
  1) It appears like a test is freezing up. We have a per-test timeout defined. The timeout is defined by OS_TEST_TIMEOUT in tox.ini, and is enforced via a fixtures.Timeout fixture set up in the oslotest base class. It looks like that timeout doesn't always work.
  2) When the global 2 hours job timeout is hit, it doesn't perform post-tests tasks such as copying over log files, which makes these problems a lot harder to troubleshoot.
  3) And of course, there is some sort of issue with likely https://review.openstack.org/#/c/298056/.

  We can fix via a revert, which will increase the failure rate of
  fullstack. Since I've been unable to reproduce this issue locally, I'd
  like to hold off on a revert and try to get some more information by
  tackling some combination of problems 1 and 2, and then adding more
  logging to figure it out.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1567668/+subscriptions


References