← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~cjwatson/launchpad/optimise-commented-bugs-search-more into lp:launchpad

 

Colin Watson has proposed merging lp:~cjwatson/launchpad/optimise-commented-bugs-search-more into lp:launchpad.

Commit message:
Optimise commented bugs search further by lifting bug task ID search into a CTE as well.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~cjwatson/launchpad/optimise-commented-bugs-search-more/+merge/371750

The fix in https://code.launchpad.net/~cjwatson/launchpad/optimise-commented-bugs-search/+merge/369656 made a reasonable stab at improving the performance of +commentedbugs in many situations by forcing it to search for commented-on bug IDs first, since that's always far more selective than looking through the whole of BugTaskFlat even though the planner may not be able to figure that out for itself.

However, in some cases (for example for somebody with no commented bugs at all), the planner only looks at this CTE after it's already done an index scan of BugTaskFlat with some very non-selective filter conditions.  The fix for this is to add another CTE to search for bugtask IDs as well, which convinces the planner to use the index on BugTaskFlat(bugtask) even in these cases.

Two before/after pairs of query plans follow, the first pair for my user and the second pair for a user I ran across in an OOPS whose +commentedbugs page reliably times out.  In my unusual case of a user who's commented on thousands of still-open bugs, the database does have to do a bit more work to find all the bugtask IDs without being able to apply a limit, but it's still acceptable (/~seb128/+commentedbugs is a more extreme case, but still workable).  At the other extreme, though, there's a vast improvement.

EXPLAIN (ANALYZE, BUFFERS) WITH
  commented_bug_ids AS (
    (SELECT BugMessage.bug FROM BugMessage WHERE BugMessage.index > 0 AND BugMessage.owner = 91)
    UNION
    (SELECT BugActivity.bug FROM BugActivity WHERE BugActivity.person = 91 AND (BugActivity.whatchanged != E'bug' OR BugActivity.message != E'added bug')))
SELECT BugTaskFlat.bugtask
FROM BugTaskFlat
WHERE BugTaskFlat.status IN (25, 10, 20, 21, 22, 13, 14)
  AND BugTaskFlat.duplicateof IS NULL
  AND BugTaskFlat.bug IN (SELECT commented_bug_ids.bug FROM commented_bug_ids)
  AND BugTaskFlat.information_type IN (1, 2)
ORDER BY BugTaskFlat.importance DESC, BugTaskFlat.bugtask
LIMIT 76 OFFSET 0;

                                                                                QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=198897.95..198898.14 rows=76 width=8) (actual time=258.906..258.921 rows=76 loops=1)
   Buffers: shared hit=145722
   CTE commented_bug_ids
     ->  HashAggregate  (cost=182437.97..183126.62 rows=68865 width=4) (actual time=123.381..128.402 rows=22976 loops=1)
           Group Key: bugmessage.bug
           Buffers: shared hit=43140
           ->  Append  (cost=468.89..182265.81 rows=68865 width=4) (actual time=8.678..100.772 rows=76700 loops=1)
                 Buffers: shared hit=43140
                 ->  Bitmap Heap Scan on bugmessage  (cost=468.89..43912.95 rows=21898 width=4) (actual time=8.677..35.161 rows=26709 loops=1)
                       Recheck Cond: ((owner = 91) AND (index > 0))
                       Heap Blocks: exact=15964
                       Buffers: shared hit=16040
                       ->  Bitmap Index Scan on bugmessage__owner__index__idx  (cost=0.00..463.42 rows=21898 width=0) (actual time=4.788..4.788 rows=26709 loops=1)
                             Index Cond: ((owner = 91) AND (index > 0))
                             Buffers: shared hit=76
                 ->  Bitmap Heap Scan on bugactivity  (cost=1323.92..137664.21 rows=46967 width=4) (actual time=12.427..56.982 rows=49991 loops=1)
                       Recheck Cond: (person = 91)
                       Filter: ((whatchanged <> 'bug'::text) OR (message <> 'added bug'::text))
                       Rows Removed by Filter: 1051
                       Heap Blocks: exact=26900
                       Buffers: shared hit=27100
                       ->  Bitmap Index Scan on bugactivity_person_datechanged_idx  (cost=0.00..1312.18 rows=57016 width=0) (actual time=6.876..6.876 rows=51042 loops=1)
                             Index Cond: (person = 91)
                             Buffers: shared hit=200
   ->  Sort  (cost=15771.33..16507.69 rows=294546 width=8) (actual time=258.904..258.910 rows=76 loops=1)
         Sort Key: bugtaskflat.importance DESC, bugtaskflat.bugtask
         Sort Method: top-N heapsort  Memory: 31kB
         Buffers: shared hit=145722
         ->  Nested Loop  (cost=1549.89..5097.09 rows=294546 width=8) (actual time=141.157..257.884 rows=2755 loops=1)
               Buffers: shared hit=145722
               ->  HashAggregate  (cost=1549.46..1551.46 rows=200 width=4) (actual time=141.099..146.947 rows=22976 loops=1)
                     Group Key: commented_bug_ids.bug
                     Buffers: shared hit=43140
                     ->  CTE Scan on commented_bug_ids  (cost=0.00..1377.30 rows=68865 width=4) (actual time=123.384..134.431 rows=22976 loops=1)
                           Buffers: shared hit=43140
               ->  Index Scan using bugtaskflat__bug__idx on bugtaskflat  (cost=0.43..17.72 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=22976)
                     Index Cond: (bug = commented_bug_ids.bug)
                     Filter: ((duplicateof IS NULL) AND (information_type = ANY ('{1,2}'::integer[])) AND (status = ANY ('{25,10,20,21,22,13,14}'::integer[])))
                     Rows Removed by Filter: 1
                     Buffers: shared hit=102582
 Planning time: 0.928 ms
 Execution time: 259.665 ms
(42 rows)

EXPLAIN (ANALYZE, BUFFERS) WITH
  commented_bug_ids AS (
    (SELECT BugMessage.bug FROM BugMessage WHERE BugMessage.index > 0 AND BugMessage.owner = 91)
    UNION
    (SELECT BugActivity.bug FROM BugActivity WHERE BugActivity.person = 91 AND (BugActivity.whatchanged != E'bug' OR BugActivity.message != E'added bug'))),
  commented_bugtask_ids AS (SELECT BugTaskFlat.bugtask FROM BugTaskFlat WHERE BugTaskFlat.bug IN (SELECT commented_bug_ids.bug FROM commented_bug_ids))
SELECT BugTaskFlat.bugtask
FROM BugTaskFlat
WHERE BugTaskFlat.status IN (25, 10, 20, 21, 22, 13, 14)
  AND BugTaskFlat.duplicateof IS NULL
  AND BugTaskFlat.bugtask IN (SELECT commented_bugtask_ids.bugtask FROM commented_bugtask_ids)
  AND BugTaskFlat.information_type IN (1, 2)
ORDER BY BugTaskFlat.importance DESC, BugTaskFlat.bugtask
LIMIT 76 OFFSET 0;

                                                                                QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=227050.48..227050.67 rows=76 width=8) (actual time=450.897..450.913 rows=76 loops=1)
   Buffers: shared hit=290883
   CTE commented_bug_ids
     ->  HashAggregate  (cost=182437.97..183126.62 rows=68865 width=4) (actual time=117.673..123.025 rows=22976 loops=1)
           Group Key: bugmessage.bug
           Buffers: shared hit=43140
           ->  Append  (cost=468.89..182265.81 rows=68865 width=4) (actual time=6.942..96.265 rows=76700 loops=1)
                 Buffers: shared hit=43140
                 ->  Bitmap Heap Scan on bugmessage  (cost=468.89..43912.95 rows=21898 width=4) (actual time=6.941..31.117 rows=26709 loops=1)
                       Recheck Cond: ((owner = 91) AND (index > 0))
                       Heap Blocks: exact=15964
                       Buffers: shared hit=16040
                       ->  Bitmap Index Scan on bugmessage__owner__index__idx  (cost=0.00..463.42 rows=21898 width=0) (actual time=4.005..4.005 rows=26709 loops=1)
                             Index Cond: ((owner = 91) AND (index > 0))
                             Buffers: shared hit=76
                 ->  Bitmap Heap Scan on bugactivity  (cost=1323.92..137664.21 rows=46967 width=4) (actual time=12.305..57.315 rows=49991 loops=1)
                       Recheck Cond: (person = 91)
                       Filter: ((whatchanged <> 'bug'::text) OR (message <> 'added bug'::text))
                       Rows Removed by Filter: 1051
                       Heap Blocks: exact=26900
                       Buffers: shared hit=27100
                       ->  Bitmap Index Scan on bugactivity_person_datechanged_idx  (cost=0.00..1312.18 rows=57016 width=0) (actual time=6.959..6.959 rows=51042 loops=1)
                             Index Cond: (person = 91)
                             Buffers: shared hit=200
   CTE commented_bugtask_ids
     ->  Nested Loop  (cost=1549.89..5093.59 rows=1176590 width=4) (actual time=135.414..254.560 rows=36247 loops=1)
           Buffers: shared hit=145722
           ->  HashAggregate  (cost=1549.46..1551.46 rows=200 width=4) (actual time=135.388..141.299 rows=22976 loops=1)
                 Group Key: commented_bug_ids.bug
                 Buffers: shared hit=43140
                 ->  CTE Scan on commented_bug_ids  (cost=0.00..1377.30 rows=68865 width=4) (actual time=117.676..128.942 rows=22976 loops=1)
                       Buffers: shared hit=43140
           ->  Index Scan using bugtaskflat__bug__idx on bugtaskflat bugtaskflat_1  (cost=0.43..17.65 rows=6 width=8) (actual time=0.004..0.004 rows=2 loops=22976)
                 Index Cond: (bug = commented_bug_ids.bug)
                 Buffers: shared hit=102582
   ->  Sort  (cost=38830.27..39566.63 rows=294546 width=8) (actual time=450.895..450.901 rows=76 loops=1)
         Sort Key: bugtaskflat.importance DESC, bugtaskflat.bugtask
         Sort Method: top-N heapsort  Memory: 30kB
         Buffers: shared hit=290883
         ->  Nested Loop  (cost=26473.70..28156.02 rows=294546 width=8) (actual time=282.350..449.807 rows=2755 loops=1)
               Buffers: shared hit=290883
               ->  HashAggregate  (cost=26473.27..26475.27 rows=200 width=4) (actual time=282.218..292.967 rows=36247 loops=1)
                     Group Key: commented_bugtask_ids.bugtask
                     Buffers: shared hit=145722
                     ->  CTE Scan on commented_bugtask_ids  (cost=0.00..23531.80 rows=1176590 width=4) (actual time=135.417..266.236 rows=36247 loops=1)
                           Buffers: shared hit=145722
               ->  Index Scan using bugtaskflat_pkey on bugtaskflat  (cost=0.43..8.40 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=36247)
                     Index Cond: (bugtask = commented_bugtask_ids.bugtask)
                     Filter: ((duplicateof IS NULL) AND (information_type = ANY ('{1,2}'::integer[])) AND (status = ANY ('{25,10,20,21,22,13,14}'::integer[])))
                     Rows Removed by Filter: 1
                     Buffers: shared hit=145161
 Planning time: 1.114 ms
 Execution time: 452.191 ms
(53 rows)

EXPLAIN (ANALYZE, BUFFERS) WITH
  commented_bug_ids AS (
    (SELECT BugMessage.bug FROM BugMessage WHERE BugMessage.index > 0 AND BugMessage.owner = 3603952)
    UNION
    (SELECT BugActivity.bug FROM BugActivity WHERE BugActivity.person = 3603952 AND (BugActivity.whatchanged != E'bug' OR BugActivity.message != E'added bug')))
SELECT BugTaskFlat.bugtask
FROM BugTaskFlat
WHERE BugTaskFlat.status IN (25, 10, 20, 21, 22, 13, 14)
  AND BugTaskFlat.duplicateof IS NULL
  AND BugTaskFlat.bug IN (SELECT commented_bug_ids.bug FROM commented_bug_ids)
  AND BugTaskFlat.information_type IN (1, 2)
ORDER BY BugTaskFlat.importance DESC, BugTaskFlat.bugtask
LIMIT 76 OFFSET 0;

                                                                                        QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3717.68..7092.11 rows=76 width=8) (actual time=13058.730..13058.731 rows=0 loops=1)
   Buffers: shared hit=1472848 read=822014 written=7
   CTE commented_bug_ids
     ->  HashAggregate  (cost=3708.12..3717.25 rows=913 width=4) (actual time=2.284..2.309 rows=96 loops=1)
           Group Key: bugmessage.bug
           Buffers: shared hit=97 read=287
           ->  Append  (cost=0.43..3705.84 rows=913 width=4) (actual time=0.036..2.132 rows=338 loops=1)
                 Buffers: shared hit=97 read=287
                 ->  Index Scan using bugmessage__owner__index__idx on bugmessage  (cost=0.43..884.67 rows=254 width=4) (actual time=0.035..0.179 rows=34 loops=1)
                       Index Cond: ((owner = 3603952) AND (index > 0))
                       Buffers: shared hit=12 read=22
                 ->  Index Scan using bugactivity_person_datechanged_idx on bugactivity  (cost=0.56..2812.04 rows=659 width=4) (actual time=0.028..1.887 rows=304 loops=1)
                       Index Cond: (person = 3603952)
                       Filter: ((whatchanged <> 'bug'::text) OR (message <> 'added bug'::text))
                       Rows Removed by Filter: 65
                       Buffers: shared hit=85 read=265
   ->  Nested Loop Semi Join  (cost=0.43..13077978.00 rows=294546 width=8) (actual time=13058.729..13058.729 rows=0 loops=1)
         Join Filter: (bugtaskflat.bug = commented_bug_ids.bug)
         Rows Removed by Join Filter: 36165984
         Buffers: shared hit=1472848 read=822014 written=7
         ->  Index Scan Backward using bugtaskflat__importance__bugtask__idx on bugtaskflat  (cost=0.43..976546.46 rows=589092 width=12) (actual time=0.281..5969.130 rows=376729 loops=1)
               Filter: ((duplicateof IS NULL) AND (information_type = ANY ('{1,2}'::integer[])) AND (status = ANY ('{25,10,20,21,22,13,14}'::integer[])))
               Rows Removed by Filter: 1974114
               Buffers: shared hit=1472751 read=821727 written=7
         ->  CTE Scan on commented_bug_ids  (cost=0.00..18.26 rows=913 width=4) (actual time=0.000..0.009 rows=96 loops=376729)
               Buffers: shared hit=97 read=287
 Planning time: 1.255 ms
 Execution time: 13058.805 ms
(28 rows)

EXPLAIN (ANALYZE, BUFFERS) WITH
  commented_bug_ids AS (
    (SELECT BugMessage.bug FROM BugMessage WHERE BugMessage.index > 0 AND BugMessage.owner = 3603952)
    UNION
    (SELECT BugActivity.bug FROM BugActivity WHERE BugActivity.person = 3603952 AND (BugActivity.whatchanged != E'bug' OR BugActivity.message != E'added bug'))),
  commented_bugtask_ids AS (SELECT BugTaskFlat.bugtask FROM BugTaskFlat WHERE BugTaskFlat.bug IN (SELECT commented_bug_ids.bug FROM commented_bug_ids))
SELECT BugTaskFlat.bugtask
FROM BugTaskFlat
WHERE BugTaskFlat.status IN (25, 10, 20, 21, 22, 13, 14)
  AND BugTaskFlat.duplicateof IS NULL
  AND BugTaskFlat.bugtask IN (SELECT commented_bugtask_ids.bugtask FROM commented_bugtask_ids)
  AND BugTaskFlat.information_type IN (1, 2)
ORDER BY BugTaskFlat.importance DESC, BugTaskFlat.bugtask
LIMIT 76 OFFSET 0;

                                                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=46112.18..46112.37 rows=76 width=8) (actual time=4.965..4.965 rows=0 loops=1)
   Buffers: shared hit=591 read=561
   CTE commented_bug_ids
     ->  HashAggregate  (cost=3708.12..3717.25 rows=913 width=4) (actual time=2.370..2.393 rows=96 loops=1)
           Group Key: bugmessage.bug
           Buffers: shared hit=17 read=367
           ->  Append  (cost=0.43..3705.84 rows=913 width=4) (actual time=0.052..2.229 rows=338 loops=1)
                 Buffers: shared hit=17 read=367
                 ->  Index Scan using bugmessage__owner__index__idx on bugmessage  (cost=0.43..884.67 rows=254 width=4) (actual time=0.051..0.216 rows=34 loops=1)
                       Index Cond: ((owner = 3603952) AND (index > 0))
                       Buffers: shared hit=3 read=31
                 ->  Index Scan using bugactivity_person_datechanged_idx on bugactivity  (cost=0.56..2812.04 rows=659 width=4) (actual time=0.035..1.951 rows=304 loops=1)
                       Index Cond: (person = 3603952)
                       Filter: ((whatchanged <> 'bug'::text) OR (message <> 'added bug'::text))
                       Rows Removed by Filter: 65
                       Buffers: shared hit=14 read=336
   CTE commented_bugtask_ids
     ->  Nested Loop  (cost=20.97..3564.67 rows=1176590 width=4) (actual time=2.490..3.553 rows=96 loops=1)
           Buffers: shared hit=276 read=492
           ->  HashAggregate  (cost=20.54..22.54 rows=200 width=4) (actual time=2.458..2.486 rows=96 loops=1)
                 Group Key: commented_bug_ids.bug
                 Buffers: shared hit=17 read=367
                 ->  CTE Scan on commented_bug_ids  (cost=0.00..18.26 rows=913 width=4) (actual time=2.372..2.425 rows=96 loops=1)
                       Buffers: shared hit=17 read=367
           ->  Index Scan using bugtaskflat__bug__idx on bugtaskflat bugtaskflat_1  (cost=0.43..17.65 rows=6 width=8) (actual time=0.010..0.010 rows=1 loops=96)
                 Index Cond: (bug = commented_bug_ids.bug)
                 Buffers: shared hit=259 read=125
   ->  Sort  (cost=38830.27..39566.63 rows=294546 width=8) (actual time=4.964..4.964 rows=0 loops=1)
         Sort Key: bugtaskflat.importance DESC, bugtaskflat.bugtask
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=591 read=561
         ->  Nested Loop  (cost=26473.70..28156.02 rows=294546 width=8) (actual time=4.958..4.958 rows=0 loops=1)
               Buffers: shared hit=591 read=561
               ->  HashAggregate  (cost=26473.27..26475.27 rows=200 width=4) (actual time=3.660..3.696 rows=96 loops=1)
                     Group Key: commented_bugtask_ids.bugtask
                     Buffers: shared hit=276 read=492
                     ->  CTE Scan on commented_bugtask_ids  (cost=0.00..23531.80 rows=1176590 width=4) (actual time=2.492..3.607 rows=96 loops=1)
                           Buffers: shared hit=276 read=492
               ->  Index Scan using bugtaskflat_pkey on bugtaskflat  (cost=0.43..8.40 rows=1 width=8) (actual time=0.013..0.013 rows=0 loops=96)
                     Index Cond: (bugtask = commented_bugtask_ids.bugtask)
                     Filter: ((duplicateof IS NULL) AND (information_type = ANY ('{1,2}'::integer[])) AND (status = ANY ('{25,10,20,21,22,13,14}'::integer[])))
                     Rows Removed by Filter: 1
                     Buffers: shared hit=315 read=69
 Planning time: 1.201 ms
 Execution time: 5.080 ms
(45 rows)
-- 
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~cjwatson/launchpad/optimise-commented-bugs-search-more into lp:launchpad.
=== modified file 'lib/lp/bugs/model/bugtasksearch.py'
--- lib/lp/bugs/model/bugtasksearch.py	2019-07-03 14:18:25 +0000
+++ lib/lp/bugs/model/bugtasksearch.py	2019-08-23 15:28:06 +0000
@@ -617,6 +617,15 @@
         extra_clauses.append(BugTaskFlat.bug_owner == params.bug_reporter)
 
     if params.bug_commenter:
+        # Make sure that the planner makes effective use of BugTaskFlat
+        # indexes.  Searching for commented-on tasks first is almost always
+        # going to be much more selective than any other filter conditions.
+        # Note that just searching for bug IDs isn't sufficient in all
+        # cases, as the planner can still end up choosing an index scan over
+        # all of BugTaskFlat with quite non-selective conditions before it
+        # considers that CTE; but picking out the task IDs as well causes it
+        # to use a much quicker index scan of BugTaskFlat(bugtask) before
+        # doing any other filtering.
         with_clauses.append(convert_storm_clause_to_string(
             With('commented_bug_ids', Union(
                 Select(
@@ -634,9 +643,15 @@
                         Or(
                             BugActivity.whatchanged != u'bug',
                             BugActivity.message != u'added bug')))))))
+        with_clauses.append(convert_storm_clause_to_string(
+            With('commented_bugtask_ids',
+                Select(
+                    BugTaskFlat.bugtask_id, tables=[BugTaskFlat],
+                    where=BugTaskFlat.bug_id.is_in(
+                        Select(Column('bug', 'commented_bug_ids')))))))
         extra_clauses.append(
-            BugTaskFlat.bug_id.is_in(
-                Select(Column('bug', 'commented_bug_ids'))))
+            BugTaskFlat.bugtask_id.is_in(
+                Select(Column('bugtask', 'commented_bugtask_ids'))))
 
     if params.affects_me:
         params.affected_user = params.user


Follow ups