launchpad-dev team mailing list archive

Thread
Date

Re: Parallel testing is live

To: launchpad-dev@xxxxxxxxxxxxxxxxxxx
From: Aaron Bentley <aaron@xxxxxxxxxxxxx>
Date: Fri, 21 Sep 2012 17:24:10 -0400
In-reply-to: <505CBE92.1060200@contre.com>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12-09-21 03:22 PM, Francis J. Lacoste wrote:
> I agree that going back to pre-commit merge is one thing we could
> try. There is a caveat in your data though. You are counting the
> number of test runs processed by buildbot in a day. The number that
> is truly important is the number of ec2test submission daily.
> Because only once a successful ec2 test run has happened does it
> gets send to buildbot.

Right.  The number of landings was much easier to get, since it didn't
require hacking all the launchpad team and grovelling their shell
histories :-).

However, we're only in trouble if the number of attempts exceeds 41,
which would be ~ a 4:1 ratio of attempts to landings.  Also, the
number of attempts is probably correlated to the speed of landings--
i.e. the faster results come back, the more incentive people have to
re-attempt landing before checking to ensure they've fixed every bug.

> What you are proposing (and what was happening before we switched
> to buildbot) is that developers simply use the landing architecture
> as a convenient test runner. One problem in the old days was that a
> lot of queued landings would fail simply because the tests hadn't
> been run, not because of failing tests because of a integration
> error or intermittent failure. (Although we had also some of
> those.)

I'm not sure that's really a problem.  People still have an incentive
to be conscientious about running the obvious tests, because 35
minutes is still a long time.  But if non-obvious tests fail, it's
better for them to fail in 35 minutes via our parallel tester than in
4 hours via ec2.  I think reckless landings would be self-limiting,
because they would tend to generate queues, reducing the advantage of
reckless landing.

> I don't know what's the effort required to set a tarmac instance
> that can run parallelized tests. (Unfortunately, it probably
> requires scarce webops resource also). But I'd be willing to try an
> experiment around that if it's cheap.

Cool.

> To achieve the similar flow you want, we can also make ec2test run
> tests in parallel in EC2.

Or Canonistack, since this probably involves a lot of re-work anyhow.

> On the big instances with 32 cores, Yellow was seeing a ~50 mins
> test run in EC2. That would put us well into range of writing and
> deploying code in the same day.  (And deploying this requires 0
> webops involvement).

It also lets people run the full suite quickly for those cases where
you know you've probably broken something, but you don't know where.

Maybe this is also a good time to plug my "fault-line" plugin
<https://launchpad.net/fault-line>, which uses revision history to
find correlations between changed files and test files.  It's good for
doing a broader test run without running the full suite:

bin/test -vm $(bzr fault-line --module-regex -r :submit)

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlBc2vkACgkQ0F+nu1YWqI2mbACghSoFQ9W53SRLTpDHS6zoZXn9
k+sAn2EE6Bd2LfVhcGEyl2vLnzTyBn1f
=cvq1
-----END PGP SIGNATURE-----

References

Parallel testing is live
From: William Grant, 2012-09-21
Re: Parallel testing is live
From: Aaron Bentley, 2012-09-21
Re: Parallel testing is live
From: Francis J. Lacoste, 2012-09-21