← Back to team overview

launchpad-dev team mailing list archive

Re: Parallel testing is live


Hash: SHA1

On 12-09-21 03:22 PM, Francis J. Lacoste wrote:
> I agree that going back to pre-commit merge is one thing we could
> try. There is a caveat in your data though. You are counting the
> number of test runs processed by buildbot in a day. The number that
> is truly important is the number of ec2test submission daily.
> Because only once a successful ec2 test run has happened does it
> gets send to buildbot.

Right.  The number of landings was much easier to get, since it didn't
require hacking all the launchpad team and grovelling their shell
histories :-).

However, we're only in trouble if the number of attempts exceeds 41,
which would be ~ a 4:1 ratio of attempts to landings.  Also, the
number of attempts is probably correlated to the speed of landings--
i.e. the faster results come back, the more incentive people have to
re-attempt landing before checking to ensure they've fixed every bug.

> What you are proposing (and what was happening before we switched
> to buildbot) is that developers simply use the landing architecture
> as a convenient test runner. One problem in the old days was that a
> lot of queued landings would fail simply because the tests hadn't
> been run, not because of failing tests because of a integration
> error or intermittent failure. (Although we had also some of
> those.)

I'm not sure that's really a problem.  People still have an incentive
to be conscientious about running the obvious tests, because 35
minutes is still a long time.  But if non-obvious tests fail, it's
better for them to fail in 35 minutes via our parallel tester than in
4 hours via ec2.  I think reckless landings would be self-limiting,
because they would tend to generate queues, reducing the advantage of
reckless landing.

> I don't know what's the effort required to set a tarmac instance
> that can run parallelized tests. (Unfortunately, it probably
> requires scarce webops resource also). But I'd be willing to try an
> experiment around that if it's cheap.


> To achieve the similar flow you want, we can also make ec2test run
> tests in parallel in EC2.

Or Canonistack, since this probably involves a lot of re-work anyhow.

> On the big instances with 32 cores, Yellow was seeing a ~50 mins
> test run in EC2. That would put us well into range of writing and
> deploying code in the same day.  (And deploying this requires 0
> webops involvement).

It also lets people run the full suite quickly for those cases where
you know you've probably broken something, but you don't know where.

Maybe this is also a good time to plug my "fault-line" plugin
<https://launchpad.net/fault-line>, which uses revision history to
find correlations between changed files and test files.  It's good for
doing a broader test run without running the full suite:

bin/test -vm $(bzr fault-line --module-regex -r :submit)

Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/