← Back to team overview

launchpad-dev team mailing list archive

Re: Launchpad Projects Merge Preview


Francis has already covered most of this, but I felt there is a small
point that should be expanded on.

On Tue, Dec 14, 2010 at 3:28 PM, Max Bowsher <maxb@xxxxxxx> wrote:
> Including a "The new way to search bugs just within the Soyuz component
> of Launchpad is ..." (etc.) in the blog post would likely make it a lot
> less likely to make people think negatively about the change.

Max, I'm really very very excited by the change we're doing here, and
I'd like to try and bring all the bits together : I think its entirely
positive, but if there are downsides or issues we're going to cause
folk, we should address them clearly and openly.

The problem with including a 'new way to ...soyuz' statement in the
blog post is that it incorrectly presumes that there is a correct 'old
way'. There isn't, and here is why.... the existing policy for where a
bug should filed is not *where the problem is visible* but instead
*what part of the code base the change needs to happen in*. Note that
this requires prescience : if we don't know what needs to change,
there is no clear place to put any given bug at the moment. The triage
job that CHR does is complicated by trying to guess *at the fix* when
a bug is filed.

Because of this it is extremely likely that any bug searches users
have been doing in Launchpads bugs have been on the wrong place. There
is, in fact, only one right place to search for Launchpad bugs and be
confident you will find existing ones -

Bugs that affect the 'Soyuz component' are currently found in:
 - soyuz
 - registry
 - foundations
 - code
 - translations
 - web
 - buildd

As an example, a bug where bugtask changes timeout sending email is
currently in 'foundations'. Why? because the issue might be due to
mailserver performance/the serial nature of our mail handler.

The subcomponent approach for bug tracking makes some sense when you
think of Launchpad as N parallel applications with one team
maintaining each application : developers need to know that the bug is
in their section of the application in order to pick it up and fix it.
But tackling bug triage of Launchpad that way implies a very static
partitioning of Launchpad (which puts up barriers), and also means
that we have to resource each 'application' in advance by having a
dedicated sub-team. The very nature of having dedicated teams means
that each thing gets its own work queue, which adds latency to fixing
problems (LEAN argues for having as few queues as possible). This
structure also means that having more folk work on areas that are in
trouble becomes an exception rather than the rule (because folk are
pulling from a per-team queue. And that means that its not uncommon
for a bug that is project-wide high importance stalls when it moves
from one teams region of maintenance to a smaller or busier teams

The new bug tracking structure is only the surface exposure of a more
fundamental change: rather than having strictly defined regions of the
code base, we're moving to a whole-project ownership model with squads
responsible for getting things done rather than regions of the code
base. Each squad will be a small team, of a size that can work well
together on a single project, timezone compatible, and ideally have a
good spread of the skills that go into making Launchpad changes:
Javascript, UI, Zope, Postgresql.

The squads then are jointly responsible for the entire Launchpad
project. If we split existing code into two - refactoring for
maintenance, we don't need to add a squad to cater for that. And vice
versa when combining components makes sense.

One of the existing things we have trouble with is handling interrupts
*and* doing project work. Teams that are both component maintainers
and doing projects tend to let interrupts(bug reports, timeouts,
ooses) fall by the wayside until their big project is done. This is
natural because doing big projects is hard and needs concentration,
and by being sole-owners of parts of the codebase while the team is
focused on the project, noone else is doing the interrupt work.

A very nice thing about the squads approach is that at any point in
time a given squad will be just doing interrupts, or just doing a big
project. Squads will get furlough from the heavy lifting involved in
project work. Something like project, interrupts, project, interrupts.

This is much more flexible too - if we are drowning in bugs, Francis
can simply not assign a big project to the next squad when an existing
project is finished. Conversely if we need to do more project work in
parallel, he can ask a squad to come out of maintenance mode early.

Now, there is a bit of a tradeoff here, we're changing from very
focused teams with deep domain knowledge to project wide teams with
deep stack knowledge. (Rather than a team that knows all about
(picking an arbitrary one) bazaar, but isn't expected to know about
all the layers in our environment, we have a squad thats expected to
know all the layers and may not know all about bazaar). This means
that we'll pay a context switch for the members of a squad when they
go from working on an issue in bazaar.launchpad.net to an issue in
answers.launchpad.net. OTOH we are eliminating massive cross team
queues, and getting many more eyeballs on the code - in extreme cases
10 times the number of folk will be responsible for code that
previously was all-but-orphaned. Past a ramping up phase, we're hoping
to balance interrupts and projects /much/ better, which should let
project work advance more quickly.

And on the bug triage side, we will have removed the tension between
*where the fix goes*, *where the symptoms are* and *how important the
bug is*. Which is a huge win.


Follow ups