launchpad-dev team mailing list archive

Thread
Date
performance tuesday - bugsummary!

To: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
From: Robert Collins <robertc@xxxxxxxxxxxxxxxxx>
Date: Wed, 8 Jun 2011 16:28:55 +1200
The bug summary table I we started work on ~ 6 weeks back is now in
devel - it will be in the deploy happening this week.

This is a pretty significant bit of work (though not complete), and
I'd like to pick its design apart a bit - you may find other areas of
the systems your are building that need similar solutions.

One thing to note is that if we *had* a reliable-queue w/2pc we could
have implemented this as an external service accepting deltas on
changes to bugs, but we don't [yet]. We can still decide to move it
out of the main database using a few different alternatives, but for
now it is a trigger based design.

I'm also going to avoid duplicating the docs -
http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/view/head:/lib/lp/bugs/doc/bugsummary.txt

Bug summary is a fact table - http://en.wikipedia.org/wiki/Fact_table
- which records various discrete /sets/ of data about bug tasks/bugs.

It does not identify individual bugs (but we could build a join table
if we wanted to).

Every bug in Launchpad is aggregated into one-or-more rows in the fact
table. The aggregation is maintained by a number of triggers which
fire either before, after or both before-and-after rows are changed in
Postgresql.

For each dimension value
(http://en.wikipedia.org/wiki/Dimension_table) that we want to report
on, we add one to the 'count' field in the fact table. This means that
when a bug/bugtask is multivalued in some dimension, we record it many
times.

bug *tasks* are one form of multi-value, so we have an immediate
multiple-recording case there. Tags are multi-valued, so we also need
to record against each tag (including the NULL tag, so that we get
very cheap 'count everything' cases). Similarly sourcepackagename
targeted bugtasks are counted twice: once against the distro[series] +
sourcepackagename, and once against the distro[series] with
sourcepackagename forced to NULL.

Finally we do the same thing for subscribers of *private* bugs so that
we can give (reasonably) accurate figures for private bugs too.

The rows we add 1 to the count field are gained by taking the cross
product of all the dimension values that apply to a bug. So a bug
with:
 2 tasks ('bzr' and 'ubuntu/bzr')
 1 tag ('foo')
will write to 6 rows: ('bzr', tag=NULL), ('bzr', tag='foo'),
('ubuntu/bzr', tag=NULL), ('ubuntu/bzr', tag='foo'), ('ubuntu',
tag=NULL), ('ubuntu', tag='foo')

Now, the schema is subject to evolution - the basics are in place but
we may/will find things we cannot query yet (for instance, site-wide
tag clouds) : feel free to improve on the schema, or chat to me/Stuart
about the design if you're not sure.

The magic though, is all in the querying.

Revision 13171 of devel uses bugsummary to generate the tag portlets
(like https://bugs.launchpad.net/ubuntu/+bugtarget-portlet-tags-content).
These are currently a high volume timeout:
hard/soft
20 /  146  Distribution:+bugtarget-portlet-tags-content
The PPR tells me that this page has a 99% percentile render time of
13.2 seconds, and a mean of 6.6.

The bugsummary page can deliver the main query for that page in 350ms
*cold*. 170ms *hot*.

So, I'm pretty sure that we'll have this timeout fixed next week :)

Why is it so fast?

This is an example of the new query:
select * from (with teams as (select team from teamparticipation where
person=319329)
(SELECT tag, sum(count)
   FROM Bugsummary
WHERE status IN (10, 15, 20, 21, 22, 25)
     AND distribution = 1 AND sourcepackagename IS NULL
    AND tag IS NOT NULL
     AND (viewed_by IS NULL OR viewed_by in (select team from teams))
GROUP BY tag
   ORDER BY sum(count) DESC, tag LIMIT 10)
UNION
(SELECT tag, sum(count)
   FROM Bugsummary
WHERE status IN (10, 15, 20, 21, 22, 25)
     AND distribution = 1 AND sourcepackagename IS NULL
    AND tag IS NOT NULL
     AND (viewed_by IS NULL OR viewed_by in (select team from teams))
     AND tag IN ('a11y', 'armel', 'bitesize', 'cherry-pick',
'compat-wireless', 'gnome3', 'hibernate', 'hw-specific', 'kernel-bug',
'lo33', 'manpage', 'metabug', 'multiarch', 'needs-design',
'needs-packaging', 'needs-reassignment', 'packaging', 'patch',
'patch-accepted-debian', 'patch-accepted-upstream',
'patch-forwarded-debian', 'patch-forwarded-upstream',
'patch-needswork', 'patch-rejected', 'patch-rejected-debian',
'patch-rejected-upstream', 'python27',
'regression-proposed','regression-release', 'regression-update',
'resume', 'string-fix', 'suspend', 'touch', 'ubuntu-unr',
'upgrade-software-version', 'verification-done',
'verification-failed', 'verification-needed')
   GROUP BY tag
   ORDER BY sum(count) DESC, tag)) as _tmp order by sum;

The union handles both high-use tags and official tags. The official
tag list is obtained in a separate query (but could easily be folded
into this one query in future).

If we look at what this does, it selects rows from bugsummary for the
target we want to summarise (the distribution - to do a source package
we would change the sourcepackagename clause to match a
sourcepackagename id).

In the first half we pin the tag field to non NULL because tag=NULL
includes every bug  - we would double count and besides, the NULL tag
is pretty boring for this portlet :)
We select the statuses that we want (note that we cannot discriminate
on incomplete-with-response yet, because thats not a discrete status
[yet]).

The first half of this query considers 8500 rows in the bugsummary table.
The second half considers 270 rows.

So all in all its examing about 1% of the data that our currently-live
portlet code examines - and this is why its substantially faster.

Cheers,
Rob
Follow ups

Re: performance tuesday - bugsummary!
From: Stuart Bishop, 2011-06-08