yellow team mailing list archive

Thread
Date

Re: parallel testing LEP questions

To: Robert Collins <robert.collins@xxxxxxxxxxxxx>
From: Benji York <benji.york@xxxxxxxxxxxxx>
Date: Thu, 24 Nov 2011 20:31:59 -0500
Cc: Launchpad Yellow Squad <yellow@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAJ3HoZ1cP51CfD0Nvuv2kaFLms_74VJhuoGHwdF75km0vzxJ0A@mail.gmail.com>

On Wed, Nov 23, 2011 at 5:05 PM, Robert Collins
<robert.collins@xxxxxxxxxxxxx> wrote:
> On Thu, Nov 24, 2011 at 10:59 AM, Benji York <benji.york@xxxxxxxxxxxxx> wrote:
>>> Our test distribution per layer is not very even - I highly doubt that
>>> we'd be able to meet a reduction to 15% of the current time splitting
>>> per layer.
>>
>> Let's look at the test distribution: The last buildbot run took 360
>> minutes.  There were 4 layers that took longer than 11 minutes to run:
>> 55, 56, 65, and 99 minutes.  All the other layers add up to about
>> 60 minutes.
>
> So the shortest run -j could give is 99 minutes, or 27% runtime. I
> don't see how you can bisect a layer, unless you mean 'create a fake
> layer extending it and manually allocate 50% of the tests to it'. That
> seems like a non-starter to me - way to much maintenance overhead.
>
>> If we bisect the four largest layers (to make it so the test runner's
>> blind layer scheduling can't bite us too hard) and assume that running 4
>> layers simultaneously imposes no more than a 50% overhead, then we would
>> be right at 40% of the current running time.
>>
>> Reasoning sidebar: 99 is the length in minutes of the longest layer; it
>> was bisected, but even then its other half is still the longest
>> remaining layer so for pessimism's sake we assume they get run one after
>> another.  All the other layers would be finished by then, so that gives
>> us 99*1.50/360 = .41.
>>
>> Even if we assume no parallelization overhead, per-test scheduling (as
>> opposed to per-layer as above) and four-way parallelization, we'll still
>> be at 25% of the original time, so I'm interested in ideas as to how we
>> might achieve a reduction to 15% of the original time.
>
> If local parallelisation will work, testr run --parallel will load
> balance all the tests optimally based on previous performance - a
> single run from e.g. ec2 can tell us which tests are slow and let it
> decide from there.

Cool.  I wasn't aware it had that functionality.

>>> The other issue of shared global state that will bite us,
>>> will also be a significant issue with -j, unless a remoting facility
>>> is brought in (and at that point it seems to be reinventing
>>> subunit.... :P).
>>
>> This is the real catch.  If the tests haven't been written to be
>> parallelizable (which LP's certainly have not), then global state
>> collisions accumulated over years of assuming non-parallel tests could
>> be hard to fix.  On the other hand, if fixing them turns out to be easy,
>> then using the test runner's built-in parallelization (-j) would be the
>> most bang for the buck.
>
> bin/test --parallel already exists and does better splitting than -j,
> so I disagree that -j would be the best approach, *if* the collisions
> etc are easy to fix :).

Indeed.  I had forgotten about --parallel.  If it hasn't already been
added to zope.testrunner, it sounds like a good candidate to replace -j.

-- 
Benji York

Follow ups

Re: parallel testing LEP questions
From: Robert Collins, 2011-11-25

References

parallel testing LEP questions
From: Gary Poster, 2011-11-23
Re: parallel testing LEP questions
From: Robert Collins, 2011-11-23
Re: parallel testing LEP questions
From: Benji York, 2011-11-23
Re: parallel testing LEP questions
From: Robert Collins, 2011-11-23