← Back to team overview

launchpad-dev team mailing list archive

bug triage guidelines strawman

 

So at the moment we have one wikipage
https://dev.launchpad.net/BugTriage describing our bug triage
guidelines. If you haven't read it, you should - its a solid analysis
of what bug triage is all about.

I think we need to refresh this now that we have our bug tracker
shared by all squads, and now that we have dedicated feature &
interrupt resources rather than a hard-to-achieve balance being struck
daily by engineers.

Francis thinks we won't get consensus on any changes via the list, and
we should get some time blocked out at the Epic to discuss this.
However I think we can make a pretty good start before then.

So, here's my take on an updated bug triage page for Launchpad...
consider this a strawman which I hope will get critically shredded and
improved on! I've been significantly influenced by the existing page,
but am taking a slightly different focus in analysing the problem.
I've tried to analyse the problem from the ground up - why we triage,
what it achieves, and the tradeoffs involved in doing it.

Some of the prose needs tightening up, but I'm confident the sense
will get across, and once we have consensus we can make it snappy.

Thanks to Martin for being a teddy bear on this.

-----------

= Launchpad Bug Triage =

== What is bug triage? ==

Triage is the act of sorting bugs into different priority groups.
There are many conflicting sorts - everyone has their pet bug that
should be 'first'. The sort order we choose is from the projects
perspective: we try to balance the needs of our users.

So, bug triage is: '''sorting bugs by importance-to-the-project''',
and these are the influences we try to strike a balance between in
assessing that importance:
 * Things affecting launchpad project health.
 * Things affecting stakeholders
 * Things affecting other users

== Why triage ==

This may be obvious, but having just a big bucket of open bugs isn't
very efficient: there are more genuinely important issues to fix than
engineers, and as such engineers will forget what things are urgent
and what aren't.

Secondly, each of the groups of users whose needs we're trying to
compromise between are interested in when things will get done. By
sorting the bugs we provide a proxy metric for when tasks will be
worked on.

== How much triage is needed? ==

The world is dynamic and constantly changing; as such any sort we come
up with for our bugs will be outdated pretty quickly. We could make
the sort complete (so all bugs are ranked) and constantly refresh it.
However this is inefficient: the only times the sort actually matters
are:
 * when a new bug is being selected to work on (by project importance).
 * when a user is taking a decision based on how long until the bug is
likely to be worked on. For instance, they might decide to work up a
patch, or whether to use Launchpad at all.

So how much sorting is enough? Two interesting metrics are freshness
and completeness.

If the sort is too old, bugs will be indicated as 'should be next to
work on' that are not valid as that any more. Our priorities may
change month to month but they rarely change faster than that : so we
can tolerate things being months (or more) stale.

The sort is complete enough if the answers to 'what is an important
bug to work on now' and questions that users may ask (like 'how long
till this will be worked on') get answers accurate enough... and how
accurate do we need?

Well thats a tradeoff, but we think the answers are accurate enough if:
 - users can see that we care about performance, regressions,
usability and polish
 - engineers selecting 'next bug to work on' based on the triage sort
usually pick things that are the most useful thing to the
project/stakeholders/users; that is that inconsequential stuff is
tackled after consequential stuff

== Bug Importance ==

Bug importance in Launchpad is where we record the result of the
triage process; we have 5 buckets we can use in Launchpad:
critical/high/medium/low/wishlist.

We don't actually ever block a release based on having a particular
importance bug - we block releases based on having regressions, which
any commit can have - and we mark that on the bug mapping to the
commit.

The buckets combine to give a partial sort: bugs in the critical
bucket are sorted before bugs in the high bucket.

We can choose to use some or all of these 5 buckets.

How many do we need? A good way to answer that is to consider our
hypothetical complete, fresh sort, and consider how many slices we'd
need to make in it to answer questions well; we also need to consider
what would change to those slices when things change (such as new
things coming that sort to the front).

Also buckets have a cost : we need a ruleset for triage that will let
us assign bugs to buckets: every bucket makes the heuristics more
complex.

Given that we have a freshness tolerance for most bugs of some months,
that we don't want to update many bugs when a single bugshuffles in
front, and that because we have more bugs coming in than we fix  -  we
need three or perhaps four buckets:

 * A topmost bucket that is generally empty and crisis bugs go into.
 * A default bucket that bugs we haven't picked out as being important
enough to sort above any other specific bug go into.
 * [optional] a bucket for bugs that are reasonably important but not
extremely so
 * And a bucket containing bugs which are within the first 6 months of work

We map these buckets into:
 * critical : generally empty, bugs that need to jump the queue go here.
 * high: bugs that are likely to get attention within 6 months
 * low [or perhaps wishlist]: All other bugs.

This has a clear tension: time-till-we-start-work is a good metric for
what bucket to put in, but given a bug with some symptoms how do we
decide what bucket it should go into.

To address this tension we use two things:
 * A quarterly review of the bugs in the high bucket, to stop it overflowing.
 * Some heuristics for sorting bugs

== Quarterly review ==

This is pretty simple - we re-triage bugs with high importance to see
if things have changed and they should be downgraded. For upgrades we
assume that user prompting will cause us to upgrade them.

== Triage guidelines ==

By default bugs are low - we sort them below all the bugs which have
had a specific priority assigned to them.

If a bug is a regression : if the thing *was* working and now isn't,
we sort it higher.

If the bug is one that has been escalated via the Launchpad
stakeholder process, it sorts at the front.

OOPS and timeout bugs also sort to the front: performance is important
to our stakeholders and OOPS dramatically affect our ability to
operate and maintain Launchpad as well as being a very negative
experience when encountered.

For things like browser support, when a new browser is released but
the vendor is in our supported-browser-set, we should treat issues as
regressions.

Beyond these rules a bug is more important than another bug if fixing
it will make Launchpad more better than fixing the other bug.

Engineers have discretion to decide any particular bug should be
sorted higher (or lower) than it has been; some change requests are
very important to many of our users and not big enough to need a
dedicated feature-squad working on them. When two engineers disagree,
or if someone in the management chain disagrees, common sense and
courtesy should be used in resolving the disagreement.

== How to triage ==

Visit https://bugs.launchpad.net/launchpad-project/+bugs?field.importance=Unknown

For each bug:
 * See if there are any duplicates by having a bit of a look around,
search your memory etc.
 * If the bug is unrelated to Launchpad, move it somewhere appropriate.
 * If the bug is something we won't do at all, mark it as won't fix.
 * If its a operational request, convert it to a question.
 * apply the guidelines in 'Triage Guidelines' to get an approximate sort
 * If the bug would sort at the very front mark it critical
 * If it would sort before all low bugs mark it high
 * Otherwise mark it low.
 * If the bug status is 'Incomplete', check that the filer was asked
to clarify something; if they were and haven't replied in a month,
close the bug. Otherwise either ask them to clarify something, or set
the bug to Triaged.
 * If the bug status is New, set it to triaged.

== Assignment ==

Bug triage does not involve assigning an engineer. Engineers should
only be assigned to bugs that are ''in progress''. Even critical bugs
do not need an engineer assigned: operational incidents are not
tracked in the bug database, though critical bugs may be generated as
followup work to be done; those bugs are then in the front-section of
the queue, but thats all that is needed.

== Selecting bugs to work on ==

The bug database holds the /project/ importance set of bugs. However
individual or squad work-queues may be quite different. For instance,
we have 3 squads working on features at any one time, 2 on
maintenance. Generally speaking squads on feature-rotation will ignore
'importance' in selecting what to work on - they will be working on a
feature and creating bugs as appropriate to create discussion points
and todo items for that feature.

The Launchpad maintenance squads however will usually be working from
the bug database - picking bugs up to work on based on their ''triaged
importance''. So for maintenance squads, they should simply look in
each bucket in order - critical, high, low - and from within that
bucket take one of the oldest bugs - one that seems interesting to
them at the time. Crucially though, all bugs in the critical bucket
should have someone or some squad working on them before any bugs in
the high bucket are picked up and worked on, and likewise for low.

Community work will often ignore our bug triage and focus on itch
scratching - and this also applies to patches done by Launchpad
engineers in their personal and slack time: the selection logic for
picking a bug only applies to effort being put in as part of their
primary duties. That is, its always totally ok to fix that low
priority bug thats really annoying you, whether you're a user of
Launchpad or a developer. A bug fix is a bug fix!

-----------------------------------------------

let the straw burning begin!

-Rob



Follow ups