yellow team mailing list archive

Thread
Date

parallel testing LEP questions

To: Robert Collins <robertc@xxxxxxxxxxxxxxxxx>
From: Gary Poster <gary.poster@xxxxxxxxxxxxx>
Date: Wed, 23 Nov 2011 12:26:48 -0500
Cc: Launchpad Yellow Squad <yellow@xxxxxxxxxxxxxxxxxxx>

Hey Robert. Francis mentioned that you had updated the parallel testing LEP so I took a moment to look at it today.

I cc'd the yellow squad to keep us all in the loop. Hi everybody! The LEP is https://dev.launchpad.net/LEP/ParallelTesting if you want to take a look.

Could you clarify these points, ideally on the LEP?

- You write that we must "[o]rganise and upgrade our CI test running to take advantage of this new environment." You also clarify that "[c]hanging the landing technology is out of scope." To make sure I understand, then, you want us to keep buildbot and everything else as-is as much as possible, but guide LOSAs to getting us machines/VMs that can quickly and robustly run these tests. Is this right? If so, no additional LEP clarification needed, I think, but otherwise, please give us more information there.

- You write in comments that "The prototype LXC + testr based parallelisation seems to have the best effort-reward tradeoff today." [Yellow folks, I found https://dev.launchpad.net/ParallelTests to describe the prototype.] Have you done enough research here that you are able to recommend or even prescribe this approach? That would probably save time, if so; and though it violates my understanding of LEP goals to have an implementation prescribed, I think that ought to be relaxed for documents written by the TA.

- If we use LXC, do you expect this effort to dig into the fragility that you note in your prototype notes, and try to improve it? If not, do you have requirements or thoughts on how to help developers work with the issues--perhaps scripts that developers are encouraged to use for the workflow, that handle problems like the ones you identify ("you may need to manually shutdown postgresql before stopping lxc, to get it to shutdown cleanly")?

- If we use LXC, you describe a number of steps to set up a working environment. Do you envision a rocketfuel-XXX style script to help produce this environment? If so, do you have any requirements for it? If not, do you have something else in mind, and can we extract requirements from that?

If you don't intend to recommend/prescribe LXC + testr, these next two question are pertinent.

- You write that the solution "[m]ust parallelise more effectively than bin/test -j (which does per-layer splits)." Is that really a "must"? If we met your success metric ("down to less than 50% of the current time, preferrable 15%-20%"), would it really matter which method got there? If it does matter, can you identify what the underlying "must" is for rejecting the -j approach, so that, for instance, other solutions can be cleanly rejected?

- Francis had said earlier when talking with me about the project that running the tests on multiple machines might be a acceptable way to achieve the goal. You specifically disallow that, even with the LEP title ("Single machine parallel testing of single branches"), even though doing this with multiple machines would match the letter of the law (the biggest stretch I see is that "[p]ermit[ting] developers to reliably run parallelised as well" would mean that developers would need to run ec2 to meet that requirement). As with the previous question, is there a deeper "must" hidden in here somewhere? Perhaps it is cost related?

That's all I've got so far. :-)

Thanks

Gary

Follow ups

Re: parallel testing LEP questions
From: Robert Collins, 2011-11-23