← Back to team overview

launchpad-dev team mailing list archive

Yellow retrospective meeting notes: June 8



New tricks:
   frankban: the "fixtures" test fixture package.
   benji: debugging output streams
   gary_poster: killing lots of lxc ephemerals is annoying, but I have a
   gmb & gary_poster: bisect and conquer for test isolation problems
   benji: escaping from the lxc login
   No innovations to learn from this week
   gmb: -t and --load-list don't work together in Launchpad's bin/test
   gary_poster: danger zone kanban cards
   gary_poster: slowdowns caused by working across teams
   gary_poster: checklists or flowcharts?

Available below (sorry, HTML was easier because of blog) or on blog:





Attending:  benji frankban gary_poster gmb

Apologies: bac

(These are freenode.net nicks)

    Project plan

  * We are fixing bugs, and as hoped, we're now back to getting a lot of
    green runs--even more than before.  Since Tuesday, we have a 74%
    pass rate. Only 2 of 35 of those recent runs had new failures, and
    they were pretty easy.  Only bug 996729 remains from legacy bugs.
  * We are still waiting on the two new 24 core machines in the data
    center to actually run the virtualized tests in production.  We
    continue to use a 32 core EC2 machine and Juju
    <https://juju.ubuntu.com/> for our tests now.
  * gary_poster has a crazy plan for converting our "ec2" command into a
    combination of smaller parts: lpsetup
    <https://dev.launchpad.net/LEP/LaunchpadSetupScripts>; a Juju
    <https://juju.ubuntu.com/> Launchpad dev charm that uses lpsetup; a
    subordinate charm to let the Launchpad charm reuse previous
    Launchpad builds on EC2, saved on EBS snapshots, so the Launchpad
    charm can start faster; and a much smaller "ec2" command that's only
    responsible for merging, starting the test runner, and sending
    emails.  Watch for a proposal coming to a wiki near you!

    Action Items

No action items from last week.

    New tricks

      frankban: the "fixtures" test fixture package.

If you have not investigated Robert Collins' Python test fixtures
(Launchpad <https://launchpad.net/python-fixtures>, PyPI
<http://pypi.python.org/pypi/fixtures>), frankban recommends it.  This
week frankban worked with a number of fixtures.  In particular, the
FakeLogger fixture is very useful for ensuring that global logs are not
printed to standard output (and frankban recently worked with Robert to
improve it, to be released soon). The environment variables fixture was
very useful for fixing a test failure too--it sets specified
environmental variables at the start of a test and automatically and
correctly resets them at the end of the test.  Another nice feature of
fixtures is that they can be combined easily.

      benji: debugging output streams

If you are trying to figure out what is going to stdout, stderr,
__stdout__, and __stderr__, then benji recommends creating an object for
these file objects that tees the output both to the normal destination
and to a file.  The file can include debugging information.  For him, he
was able to solve his problem simply by noting what the divisions were
(e.g., when something writes to stdout, his debugger divided this up
from other parts of the output to clearly see that it was a distinct
unit); you could also include tracebacks to show what code is writing
what messages.

      gary_poster: killing lots of lxc ephemerals is annoying, but I
      have a band-aid

lxc-start-ephemeral is a script (in the Ubuntu Debian lxc packaging) to
start a temporary lxc instance.  It can use another lxc instance as a
base, but it writes all of the filesystem changes to memory (via
overlayfs).  It can be very fast and effective for doing parallel work
like we are doing now, because filesystem I/O is not a blocker.

Right now there are some circumstances in which lxc-start-ephemeral will
not shut down properly; it has signal handling but for some reason
sometimes it does not clean up, and none of the standard lxc tools
(lxc-stop, lxc-destroy) work.  When you have lots of these at once--we
have 32 at once right now--it can be quite annoying.

For us, then, this is the kind of thing we need to do (thanks to benji
for refinements; mistakes are gary_poster's).

find /var/lib/lxc/ -mindepth 1 -maxdepth 1 -printf '%f\n' -name *-temp-*
| xargs -n 1 sudo lxc-stop -n ; umount
/var/lib/lxc/*-temp-*/ephemeralbind /var/lib/lxc/*-temp-* /tmp/lxc-ip-*;
rm -rf /var/lib/lxc/*-temp-* /tmp/lxc-ip-*

/frankban: what about fixing lxc-start-ephemeral to handle signals
better? gary: yes; not sure what is causing this and haven't gotten
around to diagnosing.  But look, I have a band-aid!/

      gmb & gary_poster: bisect and conquer for test isolation problems

One of the bigger sources of our problems in parallel testing is in test
isolation.  Launchpad has run its tests altogether and in the same order
for years. "Tests pass" if the suite passes, run collectively in the
usual order.

The parallel testing project divides up the tests across processes, and
the grouping is therefore variable.  To deliver a more robust testing
system and discover test bugs faster, we went a step farther to run
tests with --shuffle: a random ordering within the random grouping.

When changing ordering and grouping causes test failures, it's usually a
sign of test isolation problems.  We've had to identify and fix a lot of
those, so we've come up with a process to diagnose them.

The first step is to be able to identify what tests were run.  We worked
with Robert Collins and Jono Lange to let our test parallelization
tool, testr <https://launchpad.net/testrepository>, include subunit
<https://launchpad.net/subunit> tags that identify what tests are run
together.  Our buildbot configuration includes the lists of what tests
are run together, and when a test failure happens, the report includes
the name of the list that ran the failed test.  You might notice that
bugs we file generally include the associated test list (see bug 1010251
<https://bugs.launchpad.net/launchpad/+bug/1010251%27>, for example,
which begins with a link to "worker-17"'s tests).

We also need to be able to run tests in the order that the testrunner
had them.  Our testrunner can do that with ./bin/test --load-list,
thanks to some changes bac made.

Now we are ready to bisect and conquer.  Here's our process.

 1. Does the test fail by itself?  If so, your test probably relies on
    another being run before it, and that's not the kind of isolation
    error we're talking about now.  If not, great, let's bisect.
 2. Actually, before we bisect, let's optimize and shorten the test run
    time.  Delete all the tests after the failed test from the test
    list.  Unless the future can affect the past, you won't need them.
 3. Now for one more optimization that's really specific to Launchpad:
    we will delete all the tests that are not in the same "layer" as the
    failing test.  Each layer gives you collected, reused setup (such as
    memcache setup or database setup), and generally each layer is run
    in its own process.  Therefore, generally a failed test will only be
    affected by other tests in the same layer, though more on that
    later.  But for now, go with it: we're going to only run tests in
    the same layer.  Run ./bin/test --load-list=YOUR_TEST_LIST
    --list-tests, where YOUR_TEST_LIST is the list of tests that you
    modified in step 1.  When you don't run with --subunit it will
    include layer names.  Find the first test in the last layer of the
    result--the first test that is in the same layer as the failing
    test.  In your test list, delete all tests prior to that test.
 4. Now start running ./bin/test --load-list YOUR_TEST_LIST with the
    first half of the list, plus the last test, and then again with the
    second half of the list.  If one of them fails, keep on doing this
    step, making the list smaller and smaller until you've identified
    the test that triggers the failure.  Go look at that other test and
    clean up the isolation problem.
 5. On the other hand, if you come to a point that dividing up a test
    list does not result in either half triggering the failure, assuming
    that the problem is not intermittent, you may have an N-way
    interaction: you must run three or more tests in order to trigger
    the problem state.  I think our record is four non-isolated tests
    together triggering a failure in the final step.  You'll need to
    divide up the list into groups and bisect each group.

This process works well for us.  It's also scriptable. gary_poster might
or might not be close to a rough Go version of a script that does this.

About the optimization in step three: it's not really safe, but it
usually (almost always) works for us.  Possible reasons for this not
working include file system changes, layers with real teardowns, and
layers that don't have to change processes to start up.

      benji: escaping from the lxc login

OK, this didn't actually happen at the meeting, but later I found out
from benji how to escape from the lxc login.  For instance, if you use
lxc-start to start an lxc container, and then lxc-console to use the
container, and then you want to exit, logging out won't cut it: you'll
be challenged to log in again.  The trick is to use /ctrl-a q/ (or with
screen, /ctrl-a a q/) when you are being asked for to log in.


      No innovations to learn from this week


      gmb: -t and --load-list don't work together in Launchpad's bin/test

This is a bug.  Beware.  (Maybe we should file it!)

      gary_poster: danger zone kanban cards

We regard any active kanban card that doesn't move for a day as a
problem to be solved.  We had two of them this week (bug 996729
<https://bugs.launchpad.net/launchpad/+bug/996729> and bug 682772
<https://bugs.launchpad.net/launchpad/+bug/682772>), and our efforts to
move them failed.  Why?  /benji and gmb: Our testing environment was
hard to use for bug 996729: we actually need buildbot to see some
failures and we don't know why. We also made some mistakes that slowed
us down./

Pair programming usually helps us move cards, but it didn't for these
too.  Does the observer programmer need to keep some things in mind?  We
agree that the observer should be actively skeptical and watch the other
person's back. When something doesn't make sense, this is a trigger for
both parties to check assumptions and step back. *ACTION: gary_poster
will try making these thoughts into a simple checklist.*

/benji: Timezone differences introduced some slowdowns and reduced the
effectiveness of pairing.  I had no one to pair with after gmb's end of
day, because other people were busy on other tasks/. We agree that we
need to refine our process for "danger zone" cards that are not moving.
 As before, if an active card does not move for 24 hours, we should
apply problem solving at the morning meeting and encourage pairing.
However, if the card is still blocked after another 24 hours, and
pairing has been a problem, we need to pause at least one of the active
tasks in order to enable pairing/swarming on the problem.  /benji: a
checklist for the morning meeting would help us follow this process, and
could include our "convene a panel" pattern discussed last
week./ *ACTION: gary_poster will do this.* /frankban: after we've
successfully completed a card that went into the "danger zone", we
should share knowledge with mini-postmortem./  This can go into the
morning meeting checklist also.

      gary_poster: slowdowns caused by working across teams

We've had two kanban cards waiting for months on SpamapS to have time to
finish them.  One needs him to sponsor python-shelltoolbox into Ubuntu,
and the other needs him to package the Python charm helpers we provided
(and which depend on python-shelltoolbox).  He is very busy.  We've been
bothering him about it every couple of weeks.  What could we have done
better to get this to happen?

/gmb: we could have taken this to canonical-tech, and could still.
gary_poster: yes, but these were specifically about the charm helpers,
and SpamapS owns them and has certain requirements for them (in
particular, the one charmhelpers project makes several packages, one for
each language). gmb: maybe we should have gone to the Juju team rather
than SpamapS explicitly? gary_poster: he is the charmhelpers guy. gmb:
yes, but then juju team could schedule/prioritize it within their own
goals, and maybe also work together. gary_poster: so is the lesson that
we should never ask any single person to do something? gmb: no... but we
need something new. Timezone differences also make pings very difficult. /

Perhaps when working across teams we should request a delivery date
guess, and request that we schedule a call on that date.   If the
delivery doesn't happen on that date, on the scheduled call ask for
three things: a revised delivery date, another associated call, and a
plan to try something else if the second delivery date fails. *ACTION:
gary_poster will convert this into an experimental checklist for how to
deal with inter-team requests.*

      gary_poster: checklists or flowcharts?

The checklists that we discuss seem like flowcharts, not
checklists. /benji: keep them as checklists to keep them loose./

Follow ups