← Back to team overview

yellow team mailing list archive

Re: Python 2.7 and parallel testing

 

On 06/27/2012 12:10 PM, Graham Binns wrote:
> Okay, so keeping the thread for reference:
> 
> Today, it all Just Worked. The 32-core instance yesterday just hung
> around and then quit, but m1.small that I accidentally used today
> worked fine. Re-running things on a 32-core instance worked fine too.

Yay. mostly.

> 
> Except for one small feature: we're still seeing an unknown worker.
> 
> For these tests I limited the number of workers to 8 using
> --concurrency=8 in master.cfg. There are, however, 9 workers listed -
> 8 normal and one unknown. The unknown worker log appears to contain
> the output from bin/test. Not bin/test --subunit, mind; things like
> this:
> 
> lp.codehosting.codeimport.tests.test_worker.ForeignBranchPluginLayer:tearDown
> lp.codehosting.codeimport.tests.test_worker.TestBzrSvnImport.test_forbidden
> lp.codehosting.codeimport.tests.test_worker.TestImportDataStore.test_fetch_with_dest_transport
> lp.codehosting.codeimport.tests.test_worker.TestGitImport.test_partial
> lp.codehosting.codeimport.tests.test_worker.RedirectTests.test_redirect_to_forbidden_url
> lp.codehosting.codeimport.tests.test_worker.ForeignBranchPluginLayer:tearDown
> Running in a subprocess.
> lp.services.messaging.tests.test_rabbit.TestRabbitUnreliableSession.test_connect_with_incomplete_configuration
> lp.services.messaging.tests.test_rabbit.TestRabbitUnreliableSession.test_connect
> lp.services.messaging.tests.test_rabbit.TestRabbitUnreliableSession.test_getConsumer
> lp.services.messaging.tests.test_rabbit.TestRabbitMessageBase.test_channel_session_closed
> lp.services.messaging.tests.test_rabbit.TestRabbitSession.test_disconnect
> lp.testing.layers.BaseLayer:tearDown

These all look like expected test names from that worker log.  In
particular, the worker log is *generated* from the subunit, so it is at
least partially working; and the ":tearDown" and ":setUp" suffixes are
part of the subunit/zope.testing layer dance.

It smells very much like a subunit/testtools bug in the code that
aggregates subunit streams.  It is that code, not any code in
zope.testing or in the Launchpad tree, that generates the worker tags.

A way that this might happen is if the tags get messed up.  We see
global tags getting messed up by testr (we think, though Robert thinks
it is in zope.testing) regularly; we don't see worker or per-test tags
getting messed up/

> I've uploaded the entire contents of the lp_devel directory from this
> slave, gzipped, to U1 for your joy and edification:
> http://ubuntuone.com/4UL3L98uBZCcDHnYqYQy7c. If anyone can shed some
> light on what's going on, that'd be a great help to us here. I'm
> hoping it's not another "hey, let's muck with stdout" problem...

I don't see evidence of stdout issues yet, though we may get there yet.
 We are currently eating stdout, stderr, and __stderr__; that still
leaves fun for __stdout__!

I suspect that what is happening is that we are getting a test failure
that is messing up the subunit stream in such a way that the
testr/testtools/subunit tower falls over.  We can find out what that
test failure is, and hopefully fix both it and the fragility.  Of
course, fixing the fragility may, yes, involve __stdout__ or file
descriptors.  Whee!

My first choice for investigating this would be to, *while tests are
running*, get the contents of
/var/lib/buildbot/slaves/slave/lucid-devel/build/temp/ .  That directory
is cleaned out after the tests are finished.  Then, once you have a
non-setUp, non-tearDown and non-Running-in-a-subprocess test name from
the unknown worker log, go and find the file from that directory that
contains that test name.  Run those tests with xvfb-run ./bin/test -vvv
--subunit --load-list FILENAME on a precise LXC container somewhere
(could be the one on the slave, or could be elsewhere) and see if you
can identify something in the stream that looks "wrong,"  right before
the unknown worker test.

> 
> I'll try again on a --concurrency=20 32-core box tomorrow.
> 

Cool



References