launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #04292
Re: OOM problems with devel
On Mon, Aug 16, 2010 at 2:33 PM, Steve Kowalik
<steve.kowalik@xxxxxxxxxxxxx> wrote:
...
> Okay, so, I was discussing this with Robert at the time, so most of the
> ideas are his. Take the current output from ec2, and see which test is last.
> Use subunit-ls on a known-good ec2 run, and grab the last test, plus the
> next 20 or 30. Run only them using bin/test -vv --load-list <filename>.
>
> That gave me "lib/lp/soyuz/doc/buildd-slavescanner.txtKilled" so that has to
> be the culprit. Then, since it's a doctest, I had no other way of debugging,
> so I went in and binary searched the file by deleting half the file, then
> running the test again and seeing if that fixed it. That led me to the block
> that caused the problem.
Thanks for the explanation. I find stuff like this helpful. Continuing
in the same spirit...
Once you gave me the failing test, I tried to reproduce the bug
locally with stable (I couldn't), and then tried locally with devel,
which had the problem. That gave a short list of revisions that
introduced the bug, and it was fairly easy to home in on the right
one.
While I was running the tests to be extra sure, wgrant figured out the
problem (presumably by looking at the diff).
Random thoughts:
* Known good branches ftw.
* Real-time IRC communication ftw.
* Test suite failures often really do mean production failures.
* If we ran tests before cowboying, the related production issue
wouldn't have happened.
jml
References