launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #06914
Is it ok to report slightly inflated bug counts in portlets; counting bugs; accuracy vs speed
https://bugs.launchpad.net/launchpad/+bug/758587 describes a challenge
we have; counting bugs is expensive.
portlets like those on
https://bugs.staging.launchpad.net/ubuntu/+bugs?search=Search&field.status=In+Progress
and https://bugs.staging.launchpad.net/ubuntu show various aggregates
of bugs. They've been inaccurate (counting multiple tasks towards the
count) in the past; I recently made them more accurate as part of
making them faster; wgrant has just this week optimised the tags
portlet.
We're now about as fast as we can get while we do table scans of this
data in our current architecture (*). Scanning the 700K bug and 900K
bugtask tables together accounts for ~4 seconds of DB time once you
factor in privacy [though there is some promise of improving on that,
it won't be radically better].
A fact table (en.wikipedia.org/wiki/Fact_table) can answer a number of
aggregrates much more quickly, but at a cost. The cost is twofold: we
have to maintain the table, and its losing some precision, so
somethings become more vague and best answered from the source data.
I have put together a prototype at the SQL level; it performs
adequately: we can satisfy queries for ubuntu scale tags (queries are
at the end of this mail for those interested - or read the gory
details in the bug):
top 10 tags (logged in) completes in < 0.5 seconds and considers 33K rows
open bugs (logged in) completes in < 0.3 seconds and considers 14K rows
top 10 tags (anonymous) completes in < 0.2 seconds and considers 10K rows
open bugs (with privacy) completes in < 0.2 seconds and considers 110 rows
But what about the costs? And thats where we come to why I'm writing this mail.
Maintaining the table is easy - we can start with a background job
every few minutes, it takes 2 minutes to update globally on staging,
I'm sure on prod it would be even snappier. We can also update it with
triggers or from our appserver code.
The *results* of queries on this table will be inaccurate on private
bugs though, if a user has 2 or more paths to visibility.
Consider bug X, which I am directly subscribed to, and which
~launchpad is also assigned to. The schema I've come up with for this
table will have the bug X counted twice:
- once with me as a viewer
- once with ~launchpad as a viewer
but because we've discarded the bug ids that contributed to those rows
in the summary table, we cannot tell that we shouldn't add the counts
together, so we see 1 too many bugs when querying the table.
We have a choice:
- make the data utterly precise
- be a bit simpler and accept some inaccuracy
I'd like to go with simpler an inaccurate for now because:
- we went for years with inaccurate counts unnoticed by users [ who
can tell that 700 should be 580, really?]
- the error rate is pretty low - my tests so far suggest ~5% [of
private bugs] - so the numbers won't be *far* off.
- being more precise will be more complex - we'll need to query
privacy separately
So, specifically to jml & Deryck, but also to the whole team and any
users following this discussion: is it ok to be slightly inflated but
fast?
-Rob
(*) A horizontally sharded system could in principle calculate from
raw data concurrently on 10 or 20 nodes and get the same numbers in
1/10th or 1/20th the time. Such system usually recommend having
summary indices though which make that unnecessary.
sample queries:
tags (with privacy)
explain analyze select tag, sum(count) from bugsummary left join
teamparticipation on viewedby=teamparticipation.team where
distribution=1 and (viewedby is NULL or teamparticipation.person=2)
and sourcepackagename is NULL and tag is not null group by tag
order by sum(count) desc
limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=68724.35..68724.38 rows=10 width=14) (actual
time=430.363..430.367 rows=10 loops=1)
-> Sort (cost=68724.35..68728.58 rows=1692 width=14) (actual
time=430.363..430.364 rows=10 loops=1)
Sort Key: (sum(bugsummary.count))
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=68666.64..68687.79 rows=1692
width=14) (actual time=427.296..429.017 rows=4460 loops=1)
-> Nested Loop Left Join (cost=0.00..66672.08
rows=398912 width=14) (actual time=106.304..414.371 rows=10341
loops=1)
Filter: ((bugsummary.viewedby IS NULL) OR
(teamparticipation.person = 2))
-> Seq Scan on bugsummary (cost=0.00..7440.35
rows=55700 width=18) (actual time=106.230..135.308 rows=33483 loops=1)
Filter: ((sourcepackagename IS NULL) AND
(tag IS NOT NULL) AND (distribution = 1))
-> Index Scan using teamparticipation_team_key
on teamparticipation (cost=0.00..1.05 rows=1 width=8) (actual
time=0.003..0.006 rows=5 loops=33483)
Index Cond: (bugsummary.viewedby =
teamparticipation.team)
Total runtime: 430.495 ms
(12 rows)
tags (anonymous user)
explain analyze select tag, sum(count) from bugsummary where
distribution=1 and (viewedby is NULL) and sourcepackagename is NULL
and tag is not null group by tag
order by sum(count) desc
limit 10;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=7626.05..7626.07 rows=10 width=14) (actual
time=191.697..191.704 rows=10 loops=1)
-> Sort (cost=7626.05..7628.39 rows=935 width=14) (actual
time=191.695..191.699 rows=10 loops=1)
Sort Key: (sum(count))
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=7594.16..7605.84 rows=935 width=14)
(actual time=185.884..189.067 rows=4446 loops=1)
-> Seq Scan on bugsummary (cost=0.00..7440.35
rows=30761 width=14) (actual time=142.195..168.853 rows=10006 loops=1)
Filter: ((viewedby IS NULL) AND
(sourcepackagename IS NULL) AND (tag IS NOT NULL) AND (distribution =
1))
Total runtime: 191.804 ms
(8 rows)
open bugs (with privacy)
explain analyze select sum(count), status from bugsummary
left join teamparticipation on
bugsummary.viewedby=teamparticipation.team where
distribution=1 and (viewedby is NULL or teamparticipation.person=2)
and tag is NULL and status in (10, 15, 20, 21, 22, 25)
and sourcepackagename is NULL
group by status
order by status;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=50096.87..50096.88 rows=1 width=8) (actual
time=210.216..210.217 rows=6 loops=1)
Sort Key: bugsummary.status
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=50096.85..50096.86 rows=1 width=8) (actual
time=210.195..210.198 rows=6 loops=1)
-> Nested Loop Left Join (cost=2778.16..49589.40
rows=101490 width=8) (actual time=95.659..209.967 rows=133 loops=1)
Filter: ((bugsummary.viewedby IS NULL) OR
(teamparticipation.person = 2))
-> Bitmap Heap Scan on bugsummary
(cost=2778.16..8831.38 rows=14171 width=12) (actual
time=95.608..118.717 rows=6803 loops=1)
Recheck Cond: (tag IS NULL)
Filter: ((sourcepackagename IS NULL) AND
(distribution = 1) AND (status = ANY
('{10,15,20,21,22,25}'::integer[])))
-> Bitmap Index Scan on bugsummary_tag
(cost=0.00..2774.62 rows=162711 width=0) (actual time=35.177..35.177
rows=162711 loops=1)
Index Cond: (tag IS NULL)
-> Index Scan using teamparticipation_team_key on
teamparticipation (cost=0.00..2.86 rows=1 width=8) (actual
time=0.009..0.012 rows=3 loops=6803)
Index Cond: (bugsummary.viewedby = teamparticipation.team)
Total runtime: 210.336 ms
(14 rows)
Time: 281.852 ms
open bugs(anonymous user)
explain analyze select sum(count), status from bugsummary where
distribution=1 and (viewedby is NULL )
and tag is NULL and status in (10, 15, 20, 21, 22, 25)
and sourcepackagename is NULL
group by status
order by status;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=8868.95..8868.95 rows=1 width=8) (actual
time=101.659..101.662 rows=6 loops=1)
Sort Key: status
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=8868.92..8868.94 rows=1 width=8) (actual
time=101.634..101.639 rows=6 loops=1)
-> Bitmap Heap Scan on bugsummary (cost=2776.57..8829.79
rows=7826 width=8) (actual time=91.224..101.527 rows=110 loops=1)
Recheck Cond: (tag IS NULL)
Filter: ((viewedby IS NULL) AND (sourcepackagename IS
NULL) AND (distribution = 1) AND (status = ANY
('{10,15,20,21,22,25}'::integer[])))
-> Bitmap Index Scan on bugsummary_tag
(cost=0.00..2774.62 rows=162711 width=0) (actual time=28.688..28.688
rows=162711 loops=1)
Index Cond: (tag IS NULL)
Total runtime: 101.770 ms
(10 rows)
Time: 103.844 ms
Follow ups