launchpad-dev team mailing list archive

Thread
Date
Yellow Squad Weekly Retrospective Minutes: June 22

To: Launchpad Development List <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
From: Gary Poster <gary.poster@xxxxxxxxxxxxx>
Date: Fri, 22 Jun 2012 17:12:06 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1
Headlines

Project plan
Action Items
New tricks
 * gary_poster: when you think of something to share for the weekly
retrospective call, why not write it on the wiki page
 * gary_poster: When starting a new project, look at our checklist and jml's
 * gary_poster: When apt-get fails...
 * benji: bzr negative ignore
 * frankban: Python's inspect module
 * gmb: Beware: global state hates you: doctest die in a fire part XXXII
Successes
 * gary_poster: lpsetup came from our first trashed prototype.  Yay!
 * gary_poster: can we learn anything from frankban's successful
analysis  of bug 1015318?
Pain
 * gary_poster: do we want lpsetup integration tests?
 * gmb: what can we learn from fixing the failing zope.testing fork
tests: it never should have happened



The minutes are also here:
http://codesinger.blogspot.com/2012/06/yellow-squad-weekly-retrospective_22.html

The only real advantage to my mind is that the html version has links.



Attendance

Attending: the gang's all here (bac benji frankban gary_poster gmb)
(These are freenode.net nicks)

Project plan

 * For a brief moment in time, we achieved our goal of a 95% success
parallel test rate for the first time, given our statistical approach of
looking at the test runs in a rolling window of the last three days.
We've been moving around between 80-95% success rate this week.
 * With Serge Hallyn's help we have a workaround for bug 994752 bug
1014916.  If you are starting seven or more LXC Lucid containers at once
and care about start-up time, you probably want this workaround too.  It
gives us increased reliability and shaves about three minutes off our
test run time.
 * We have chosen to reduce concurrency on the 32 hyperthread machines
to 20 simultaneous LXC containers.  This seems to give slightly yet
noticeably better timing than our other experiments of 32, 24, and 16.
 * However, we still see timeouts as described in 974617, 1011847, and
1002820 (as discussed last week), and these are the sources of our only
recurring failures now.  We adjusted our approach last week but reduced
the timeouts.  We're going to increase the timeouts one more time, and
then go back to the drawing board.
 * As mentioned last week, we found and addressed one issue with
testrepository parallel LXC workers not completing at very different
times, but it is still a problem.  The first worker to finish is now
typically about seven minutes earlier than the last worker to finish,
within a given test run.  We figure contention of some sort might be
throwing a random spanner in the works, or the test timing and
scheduling is too far off from ideal, or the division of layer setup
across the workers is throwing too much variability for the scheduling
to be able to do a good job.  We have a low priority task to investigate
further.
 * The two new 24 core machines in the data center to actually run the
virtualized tests in production are supposed to arrive within the next
couple of business days.
 * We are landing kanban cards toward our lpsetup stretch goal.  We'll
be talking with matsubara to hopefully set up tarmac for the project,
and maybe Jenkins later for some integration tests.

Action Items

[None]

New tricks

 * gary_poster: when you think of something to share for the weekly
retrospective call, why not write it on the wiki page

As a follow on to benji's suggestion to have daily calendar reminders to
think about things to share, why not write down any topics you think of
on the wiki page?  It might let us have something like an agenda.
benji: how would we write them--just cryptic notes to remind ourselves,
or a full write-up?  gary_poster: I think just notes are fine.

 * gary_poster: When starting a new project, look at our checklist and jml's

We're officially starting a new stretch project now with lpsetup.  We
have our own tiny baby checklist for a project.  It also links to a
getting-fabulously-better-and-yet-ever-depressingly-larger checklist
that jml has been working on.

The only real message in our checklist is "hey, prototypes are cool, and
competing prototypes especially.  And follow those other rules too,
whydoncha."  jml's is a lot more comprehensive (and he's looking for
help with automation, if you are interested!).

 * gary_poster: When apt-get fails...

Our juju charm sometimes fails on ec2 with errors like this:
subprocess.CalledProcessError: Command '['apt-get', 'install', '-y',
'--force-yes', u'your-package-name']' returned non-zero exit status 100
We've seen this before, but I forgot, so I'm sharing it now. This is
caused by an apt cache that has hashes that don't match the packages.
You can resolve this with apt-get clean on the cloud machine (then
locally use juju resolve --retry your_service_name/0 and wait for
install-error to go away).

It would be really nice to add this automation to the Python charm
helpers once they are packaged and usable (waiting on bug 1016588). If
the install fails with 100, we would automatically try an apt-clean.
benji: what about just always clearing the cache first? gary_poster: if
the charm is actually relatively fast to start, and the cache is fine,
that would be a loss that might be noticeable. <shrug>

 * benji: bzr negative ignore

A neat trick that came in handy this week was that bzr supports a
negative ignore, with a bang ("!"), like "!pattern".  This came in handy
for him when he wanted to assert that everything inside a log directory
should be ignored except a README.

 * frankban: Python's inspect module

Python's inspect module was helpful this week in investigating what was
going on in a tough analysis.  You can look back in the frames of a
given call.  It can help when pdb is not an option because code is split
up across threads or processes, or because stdin or stdout are being
used, or because there are too many callspots and you need to come up
with data to analyze rather than stepping through something.

benji: it is nice for profiling too.  You can log one level back, and
then two levels back, and so on, when the standard profiling tools don't
work for one reason or another.  gary_poster: traceback module is less
fine-grained but can be more quickly convenient for some tasks.

 * gmb: Beware: global state hates you: doctest die in a fire part XXXII

The doctest module mucks with stdout, stderr, and __stdout__ and
__stderr__ by its very nature.  This can make debugging particularly
unpleasant when you yourself are doing things with stdout and stderr.
Our solution was to convert doctests to unittests.  bac: The testtools
doctest matcher makes converting from doctest to unit test a lot easier.


Successes

 * gary_poster: lpsetup came from our first trashed prototype.  Yay!

We had setuplxc, a script that we used to initialize our ec2 instances
for parallel testing.  It ended up being a prototype that we trashed and
rewrote into lpsetup, our current stretch project.  This is the first
time we followed our resolution to prototype, trash, and rewrite, and it
has worked well.  frankban: releasing early means you can refactor early.

 * gary_poster: can we learn anything from frankban's successful
analysis  of bug 1015318?

We already discussed the inspect module in relation to this bug.  It let
him gain necessary knowledge of what's going on in distant parts of the
code (transaction code and database wrappers).


Pain

 * gary_poster: do we want lpsetup integration tests?

Our lpsetup code has a nice set of unit tests of its infrastructure and
helpers, but no integration tests--and therefore, effectively, no tests
of the commands themselves.

We know from experience that a full run of the code to create a working
lxc launchpad environment takes about an hour on ec2.  Full integration
tests could take multiples of that.

Do we want integration tests of lpsetup?  If so, what are their goals?
Can we use mocks or stubs to keep from actually running the real
commands, and spending hours running tests, or is that pointless?  How
valuable would these tests be?  Will reports from users tell us of
problems at about the same speed as the integration tests?

benji/frankban: we could run commands in an LXC ephemeral container to
get a full end-to-end test that is cleanly thrown away at the end.  LXC
containers can nest now so it could work.

benji: we can write some tests that verify that the commands still
basically worked.  For instance, yesterday I made sure that the help
command still worked and showed the expected information.  We could
automate that and have quick tests.  gary_poster: that sounds like smoke
tests, and it would be nice to have those in the test suite as a first
step.  frankban: our infrastructure has us write commands as steps,
where each step is a function.  Another smoke test might be to make sure
that we have the steps we expect in the order we expect.

ACTION: gary_poster will make a kanban card to create a first cut at
smoke tests for all of our subcommands.

ACTION: gary_poster will make a kanban card to make the tarmac
gatekeeper enforce our test coverage expectations (the code review of
the enforcement will also include the discussion as to what we are
enforcing as a first cut).

gary_poster: sometimes when I write tests with mocks/stubs I feel like
I'm just copying the same information from the source to the test, with
different spelling.  It feels like using stubs for this code would be
like that.  In that situation, what's the value?  benji: when I feel
like that, I usually find that it is because I'm writing tests  of the
"happy path" and not of the exceptions.

Consensus: we would like to have a guarantee that code actually works to
build a Launchpad working environment before it is released.  We want
true integration tests, rather than merely mocks/stubs. How could we do
that? benji: we could have the integration tests in Jenkins, with tarmac
using only the unit tests to gate commits to lpsetup.

How do we use the Jenkins tests to keep bad code from being released?
gary_poster: lpsetup is packaged, so we could use the results to
determine whether we manually make the lpsetup package.  This maybe
could be automated using a variety of approaches (Could Jenkins trigger
a PPA build of a specific revision? Could we have a second branch that
accepts Jenkins/integration-blessed revisions, with PPAs built daily
from it?).

benji: if we do these integration tests, we should probably first have a
card for an integration test prototype, so we can figure out how to do it.

gary_poster: do we allow tests of one subcommand to build off the state
generated from another subcommand?  Don't we have bad experience with
that?  bac: yes, we do, but how else would we do it in this case?  [We
have no answer, so that is how we'd do it.].

gary_poster: the integration test ideas sound great, and I want them,
but they sound expensive.  We do not have an unlimited runway for this,
we won't be working on this until it's done, and in fact we could be
pulled off of this stretch goal project in two or three weeks.  I'd
rather have something released that is better than what we had, instead
of something discarded that was supposed to have been even better.
Given unit tests and smoke tests, are integration tests something we
should discard or postpone?  Should we timebox it?  Or is it essential?

bac: an additional cost is that getting a box from IS for automating
integration tests on Jenkins may add even more time and effort to make
this entirely impractical.  That's a process issue that we have yet to
address.

benji, bac, gmb: we vote for postpone integration tests.  frankban: I
vote for timeboxed integration tests.

ACTION: gary_poster will create a slack card for investigating
integration test approaches.  If someone works on this in slack time and
shows us a way forward, we'll open this conversation again.  Until that
point, or until we successfully release lpsetup for developer usage,
they are postponed and effectively discarded.

 * gmb: what can we learn from fixing the failing zope.testing fork
tests: it never should have happened

[Editors note: The kanban card for this task took more than a day to
move out of coding, so it automatically became a topic for the weekly
call, per our checklist.]

The Launchpad project has a longstanding fork of zope.testing.  Some of
the tests started failing a year or more ago.  Since the yellow squad
started working with it, we fixed many of the broken tests and
documented the remaining three that we felt were too much of a bother
for their value.  More recently, in the work to clean up the subunit
stream, we made a mistake and suddenly broke many of the tests and
committed this to our "trunk".  How did this happen?

It simply shouldn't have happened.  We know better.  We shouldn't have a
fork, we shouldn't have commits with broken tests, and we shouldn't have
a project without a gatekeeper like pqm or tarmac.

gary_poster: following jml's new project checklist would at least have
made us have a gatekeeper, fixing two of those three.  Is getting tarmac
for a project cheap enough now that it is reasonable to deploy even for
small projects?

ACTION: bac will research how to get and integrate tarmac resources (a
testing machine) for a project.  He will first consult with matsubara
about this.  The results will be new/improved documentation on how to
get tarmac deployed for a project, and/or information on what it would
take to make this easier.
Follow ups

Re: Yellow Squad Weekly Retrospective Minutes: June 22
From: Aaron Bentley, 2012-06-25
Re: Yellow Squad Weekly Retrospective Minutes: June 22
From: Jonathan Lange, 2012-06-25
Re: Yellow Squad Weekly Retrospective Minutes: June 22
From: James Westby, 2012-06-22