← Back to team overview

launchpad-dev team mailing list archive

Re: OOM problems with devel

 

On Mon, Aug 16, 2010 at 2:33 PM, Steve Kowalik
<steve.kowalik@xxxxxxxxxxxxx> wrote:
...
> Okay, so, I was discussing this with Robert at the time, so most of the
> ideas are his. Take the current output from ec2, and see which test is last.
> Use subunit-ls on a known-good ec2 run, and grab the last test, plus the
> next 20 or 30. Run only them using bin/test -vv --load-list <filename>.
>
> That gave me "lib/lp/soyuz/doc/buildd-slavescanner.txtKilled" so that has to
> be the culprit. Then, since it's a doctest, I had no other way of debugging,
> so I went in and binary searched the file by deleting half the file, then
> running the test again and seeing if that fixed it. That led me to the block
> that caused the problem.

Thanks for the explanation. I find stuff like this helpful. Continuing
in the same spirit...

Once you gave me the failing test, I tried to reproduce the bug
locally with stable (I couldn't), and then tried locally with devel,
which had the problem. That gave a short list of revisions that
introduced the bug, and it was fairly easy to home in on the right
one.

While I was running the tests to be extra sure, wgrant figured out the
problem (presumably by looking at the diff).

Random thoughts:
  * Known good branches ftw.
  * Real-time IRC communication ftw.
  * Test suite failures often really do mean production failures.
  * If we ran tests before cowboying, the related production issue
wouldn't have happened.

jml



References