← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~wgrant/launchpad/bugsummary-v2-rebuild into lp:launchpad

 

William Grant has proposed merging lp:~wgrant/launchpad/bugsummary-v2-rebuild into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~wgrant/launchpad/bugsummary-v2-rebuild/+merge/111834

This branch is the main migration infrastructure for BugSummary's adoption of the new sharing model.

In the beginning, BugSummary was created during a 90 minute database outage, letting it be safely populated in a single 10 minute statement. Since then it's been trigger-maintained, with no facility for rebuilding it or correcting errors. This branch adds such a facility -- a script to recalculate the entire table, with sensible concurrency requirements. This rebuild serves two purposes: firstly, it lets us migrate BugSummary to use the new sharing model; and, secondly, it will correct the substantial drift in the table which causes some bug filters to show negative counts.

After trialling many different approaches, I settled on a target-driven one. The script is implemented as a DBLoopTuner over all the distinct IBugTargets collected from BugSummary and BugTaskFlat. For each, it recalculates BugSummary from BugTaskFlat, diffs the results against BugSummary itself, and then applies the changes.

Most targets complete in tens of milliseconds, but large ones can take a while (eg. Ubuntu takes nearly 30s to calculate and a further 20s to apply the changes). The whole script took about 30 minutes to recalculate all 45000 targets during an initial run on dogfood. Subsequent runs will be substantially faster (~10 minutes), as there will be fewer changes required. By running this semi-regularly we should be able to notice and track down any further bugs in the triggers that cause drift like we've seen in the past.

bugsummary_rollup_journal(), called by garbo-frequently, is a little possessive of BugSummary. It expects to hold an exclusive write lock over the table, so it probably can't sensibly run at the same time as this rebuild script. We can just disable it for the duration, or hope that they don't step on each other's toes. I had considered making this conflict-free by simply having it journal the deltas, but that would result in hundreds of thousands of BugSummaryJournal rows, a case not well handled by the querying code.
-- 
https://code.launchpad.net/~wgrant/launchpad/bugsummary-v2-rebuild/+merge/111834
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~wgrant/launchpad/bugsummary-v2-rebuild into lp:launchpad.
=== modified file 'database/schema/security.cfg'
--- database/schema/security.cfg	2012-06-22 05:36:22 +0000
+++ database/schema/security.cfg	2012-06-25 12:59:54 +0000
@@ -2390,3 +2390,18 @@
 public.teammembership                   = SELECT
 public.teamparticipation                = SELECT
 type=user
+
+[bugsummaryrebuild]
+type=user
+groups=script
+public.bugsummary                       = SELECT, INSERT, UPDATE, DELETE
+public.bugsummaryjournal                = SELECT, DELETE
+public.bugsubscription                  = SELECT
+public.bugtag                           = SELECT
+public.bugtask                          = SELECT
+public.bugtaskflat                      = SELECT
+public.distribution                     = SELECT
+public.distroseries                     = SELECT
+public.product                          = SELECT
+public.productseries                    = SELECT
+public.sourcepackagename                = SELECT

=== added file 'lib/lp/bugs/scripts/bugsummaryrebuild.py'
--- lib/lp/bugs/scripts/bugsummaryrebuild.py	1970-01-01 00:00:00 +0000
+++ lib/lp/bugs/scripts/bugsummaryrebuild.py	2012-06-25 12:59:54 +0000
@@ -0,0 +1,356 @@
+# Copyright 2012 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+__metaclass__ = type
+
+import transaction
+
+from storm.expr import (
+    Alias,
+    And,
+    Cast,
+    Count,
+    Join,
+    Or,
+    Select,
+    Union,
+    With,
+    )
+from storm.properties import Bool
+
+from lp.bugs.model.bug import BugTag
+from lp.bugs.model.bugsubscription import BugSubscription
+from lp.bugs.model.bugsummary import BugSummary
+from lp.bugs.model.bugtask import (
+    bug_target_to_key,
+    bug_target_from_key,
+    BugTask,
+    )
+from lp.bugs.model.bugtaskflat import BugTaskFlat
+from lp.registry.enums import (
+    PRIVATE_INFORMATION_TYPES,
+    PUBLIC_INFORMATION_TYPES,
+    )
+from lp.registry.interfaces.distribution import IDistribution
+from lp.registry.interfaces.distroseries import IDistroSeries
+from lp.registry.interfaces.series import ISeriesMixin
+from lp.registry.model.product import Product
+from lp.registry.model.productseries import ProductSeries
+from lp.registry.model.distribution import Distribution
+from lp.registry.model.distroseries import DistroSeries
+from lp.registry.model.sourcepackagename import SourcePackageName
+from lp.services.database.bulk import create
+from lp.services.database.lpstorm import IStore
+from lp.services.looptuner import TunableLoop
+
+
+class RawBugSummary(BugSummary):
+    """Like BugSummary, except based on the raw DB table.
+
+    BugSummary is actually based on the combinedbugsummary view, and it omits
+    fixed_upstream, a column that's being removed.
+    """
+    __storm_table__ = 'bugsummary'
+
+    fixed_upstream = Bool()
+
+
+class BugSummaryJournal(BugSummary):
+    """Just the necessary columns of BugSummaryJournal."""
+    # It's not really BugSummary, but the schema is the same.
+    __storm_table__ = 'bugsummaryjournal'
+
+
+def get_bugsummary_targets():
+    """Get the current set of targets represented in BugSummary."""
+    return set(IStore(RawBugSummary).find(
+        (RawBugSummary.product_id, RawBugSummary.productseries_id,
+         RawBugSummary.distribution_id, RawBugSummary.distroseries_id,
+         RawBugSummary.sourcepackagename_id)).config(distinct=True))
+
+
+def get_bugtask_targets():
+    """Get the current set of targets represented in BugTask."""
+    new_targets = set(IStore(BugTask).find(
+        (BugTask.productID, BugTask.productseriesID,
+         BugTask.distributionID, BugTask.distroseriesID,
+         BugTask.sourcepackagenameID)).config(distinct=True))
+    # BugSummary counts package tasks in the packageless totals, so
+    # ensure that there's also a packageless total for each distro(series).
+    new_targets.update(set(
+        (p, ps, d, ds, None) for (p, ps, d, ds, spn) in new_targets))
+    return new_targets
+
+
+def load_target(pid, psid, did, dsid, spnid):
+    store = IStore(Product)
+    p, ps, d, ds, spn = map(
+        lambda (cls, id): store.get(cls, id) if id is not None else None,
+        zip((Product, ProductSeries, Distribution, DistroSeries,
+             SourcePackageName),
+            (pid, psid, did, dsid, spnid)))
+    return bug_target_from_key(p, ps, d, ds, spn)
+
+
+def format_target(target):
+    id = target.pillar.name
+    series = (
+        (ISeriesMixin.providedBy(target) and target)
+        or getattr(target, 'distroseries', None)
+        or getattr(target, 'productseries', None))
+    if series:
+        id += '/%s' % series.name
+    spn = getattr(target, 'sourcepackagename', None)
+    if spn:
+        id += '/+source/%s' % spn.name
+    return id
+
+
+def _get_bugsummary_constraint_bits(target):
+    raw_key = bug_target_to_key(target)
+    # Map to ID columns to work around Storm bug #682989.
+    return dict(
+        ('%s_id' % k, v.id if v else None) for (k, v) in raw_key.items())
+
+
+def get_bugsummary_constraint(target, cls=RawBugSummary):
+    """Convert an `IBugTarget` to a list of constraints on RawBugSummary."""
+    # Map to ID columns to work around Storm bug #682989.
+    return [
+        getattr(cls, k) == v
+        for (k, v) in _get_bugsummary_constraint_bits(target).iteritems()]
+
+
+def get_bugtaskflat_constraint(target):
+    """Convert an `IBugTarget` to a list of constraints on BugTaskFlat."""
+    raw_key = bug_target_to_key(target)
+    # For the purposes of BugSummary, DSP/SP tasks count for their
+    # distro(series).
+    if IDistribution.providedBy(target) or IDistroSeries.providedBy(target):
+        del raw_key['sourcepackagename']
+    # Map to ID columns to work around Storm bug #682989.
+    return [
+        getattr(BugTaskFlat, '%s_id' % k) == (v.id if v else None)
+        for (k, v) in raw_key.items()]
+
+
+def get_bugsummary_rows(target):
+    """Find the `RawBugSummary` rows for the given `IBugTarget`.
+
+    RawBugSummary is the bugsummary table in the DB, not to be confused
+    with BugSummary which is actually combinedbugsummary, a view over
+    bugsummary and bugsummaryjournal.
+    """
+    return IStore(RawBugSummary).find(
+        (RawBugSummary.status, RawBugSummary.milestone_id,
+         RawBugSummary.importance, RawBugSummary.has_patch,
+         RawBugSummary.fixed_upstream, RawBugSummary.tag,
+         RawBugSummary.viewed_by_id, RawBugSummary.count),
+        *get_bugsummary_constraint(target))
+
+
+def get_bugsummaryjournal_rows(target):
+    """Find the `BugSummaryJournal` rows for the given `IBugTarget`."""
+    return IStore(BugSummaryJournal).find(
+        BugSummaryJournal,
+        *get_bugsummary_constraint(target, cls=BugSummaryJournal))
+
+
+def calculate_bugsummary_changes(old, new):
+    """Calculate the changes between between the new and old dicts.
+
+    Takes {key: int} dicts, returns items from the new dict that differ
+    from the old one.
+    """
+    keys = set()
+    keys.update(old.iterkeys())
+    keys.update(new.iterkeys())
+    added = {}
+    updated = {}
+    removed = []
+    for key in keys:
+        old_val = old.get(key, 0)
+        new_val = new.get(key, 0)
+        if old_val == new_val:
+            continue
+        if old_val and not new_val:
+            removed.append(key)
+        elif new_val and not old_val:
+            added[key] = new_val
+        else:
+            updated[key] = new_val
+    return added, updated, removed
+
+
+def apply_bugsummary_changes(target, added, updated, removed):
+    """Apply a set of BugSummary changes to the DB."""
+    bits = _get_bugsummary_constraint_bits(target)
+    target_key = tuple(map(
+        bits.__getitem__,
+        ('product_id', 'productseries_id', 'distribution_id',
+         'distroseries_id', 'sourcepackagename_id')))
+    target_cols = (
+        RawBugSummary.product_id, RawBugSummary.productseries_id,
+        RawBugSummary.distribution_id, RawBugSummary.distroseries_id,
+        RawBugSummary.sourcepackagename_id)
+    key_cols = (
+        RawBugSummary.status, RawBugSummary.milestone_id,
+        RawBugSummary.importance, RawBugSummary.has_patch,
+        RawBugSummary.fixed_upstream, RawBugSummary.tag,
+        RawBugSummary.viewed_by_id)
+
+    # Postgres doesn't do bulk updates, so do a delete+add.
+    for key, count in updated.iteritems():
+        removed.append(key)
+        added[key] = count
+
+    # Delete any excess rows. We do it in batches of 100 to avoid enormous ORs
+    while removed:
+        chunk = removed[:100]
+        removed = removed[100:]
+        exprs = [
+            map(lambda (k, v): k == v, zip(key_cols, key))
+            for key in chunk]
+        IStore(RawBugSummary).find(
+            RawBugSummary,
+            Or(*[And(*expr) for expr in exprs]),
+            *get_bugsummary_constraint(target)).remove()
+
+    # Add any new rows. We know this scales up to tens of thousands, so just
+    # do it in one hit.
+    if added:
+        create(
+            target_cols + key_cols + (RawBugSummary.count,),
+            [target_key + key + (count,) for key, count in added.iteritems()])
+
+
+def rebuild_bugsummary_for_target(target, log):
+    log.debug("Rebuilding %s" % format_target(target))
+    existing = dict(
+        (v[:-1], v[-1]) for v in get_bugsummary_rows(target))
+    expected = dict(
+        (v[:-1], v[-1]) for v in calculate_bugsummary_rows(target))
+    added, updated, removed = calculate_bugsummary_changes(existing, expected)
+    if added:
+        log.debug('Added %r' % added)
+    if updated:
+        log.debug('Updated %r' % updated)
+    if removed:
+        log.debug('Removed %r' % removed)
+    apply_bugsummary_changes(target, added, updated, removed)
+    # We've just made bugsummary match reality, ignoring any
+    # bugsummaryjournal rows. So any journal rows are at best redundant,
+    # or at worst incorrect. Kill them.
+    get_bugsummaryjournal_rows(target).remove()
+
+
+def calculate_bugsummary_rows(target):
+    """Calculate BugSummary row fragments for the given `IBugTarget`.
+
+    The data is re-aggregated from BugTaskFlat, BugTag and BugSubscription.
+    """
+    # Use a CTE to prepare a subset of BugTaskFlat, filtered to the
+    # relevant target and to exclude duplicates, and with has_patch
+    # calculated.
+    relevant_tasks = With(
+        'relevant_task',
+        Select(
+            (BugTaskFlat.bug_id, BugTaskFlat.information_type,
+             BugTaskFlat.status, BugTaskFlat.milestone_id,
+             BugTaskFlat.importance,
+             Alias(BugTaskFlat.latest_patch_uploaded != None, 'has_patch')),
+            tables=[BugTaskFlat],
+            where=And(
+                BugTaskFlat.duplicateof_id == None,
+                *get_bugtaskflat_constraint(target))))
+
+    # Storm class to reference the CTE.
+    class RelevantTask(BugTaskFlat):
+        __storm_table__ = 'relevant_task'
+
+        has_patch = Bool()
+
+    # Storm class to reference the union.
+    class BugSummaryPrototype(RawBugSummary):
+        __storm_table__ = 'bugsummary_prototype'
+
+    # Prepare a union for all combination of privacy and taggedness.
+    # It'll return a full set of
+    # (status, milestone, importance, has_patch, tag, viewed_by) rows.
+    common_cols = (
+        RelevantTask.status, RelevantTask.milestone_id,
+        RelevantTask.importance, RelevantTask.has_patch,
+        Alias(False, 'fixed_upstream'))
+    null_tag = Alias(Cast(None, 'text'), 'tag')
+    null_viewed_by = Alias(Cast(None, 'integer'), 'viewed_by')
+
+    tag_join = Join(BugTag, BugTag.bugID == RelevantTask.bug_id)
+    sub_join = Join(
+        BugSubscription,
+        BugSubscription.bug_id == RelevantTask.bug_id)
+
+    public_constraint = RelevantTask.information_type.is_in(
+        PUBLIC_INFORMATION_TYPES)
+    private_constraint = RelevantTask.information_type.is_in(
+        PRIVATE_INFORMATION_TYPES)
+
+    unions = Union(
+        # Public, tagless
+        Select(
+            common_cols + (null_tag, null_viewed_by),
+            tables=[RelevantTask], where=public_constraint),
+        # Public, tagged
+        Select(
+            common_cols + (BugTag.tag, null_viewed_by),
+            tables=[RelevantTask, tag_join], where=public_constraint),
+        # Private, tagless
+        Select(
+            common_cols + (null_tag, BugSubscription.person_id),
+            tables=[RelevantTask, sub_join], where=private_constraint),
+        # Private, tagged
+        Select(
+            common_cols + (BugTag.tag, BugSubscription.person_id),
+            tables=[RelevantTask, sub_join, tag_join],
+            where=private_constraint),
+        all=True)
+
+    # Select the relevant bits of the prototype rows and aggregate them.
+    proto_key_cols = (
+        BugSummaryPrototype.status, BugSummaryPrototype.milestone_id,
+        BugSummaryPrototype.importance, BugSummaryPrototype.has_patch,
+        BugSummaryPrototype.fixed_upstream, BugSummaryPrototype.tag,
+        BugSummaryPrototype.viewed_by_id)
+    origin = IStore(BugTaskFlat).with_(relevant_tasks).using(
+        Alias(unions, 'bugsummary_prototype'))
+    results = origin.find(proto_key_cols + (Count(),))
+    results = results.group_by(*proto_key_cols).order_by(*proto_key_cols)
+    return results
+
+
+class BugSummaryRebuildTunableLoop(TunableLoop):
+
+    maximum_chunk_size = 100
+
+    def __init__(self, log, dry_run, abort_time=None):
+        super(BugSummaryRebuildTunableLoop, self).__init__(log, abort_time)
+        self.dry_run = dry_run
+        self.targets = list(
+            get_bugsummary_targets().union(get_bugtask_targets()))
+        self.offset = 0
+
+    def isDone(self):
+        return self.offset >= len(self.targets)
+
+    def __call__(self, chunk_size):
+        chunk_size = int(chunk_size)
+        chunk = self.targets[self.offset:self.offset + chunk_size]
+
+        for target_key in chunk:
+            target = load_target(*target_key)
+            rebuild_bugsummary_for_target(target, self.log)
+        self.offset += len(chunk)
+
+        if not self.dry_run:
+            transaction.commit()
+        else:
+            transaction.abort()

=== added file 'lib/lp/bugs/scripts/tests/test_bugsummaryrebuild.py'
--- lib/lp/bugs/scripts/tests/test_bugsummaryrebuild.py	1970-01-01 00:00:00 +0000
+++ lib/lp/bugs/scripts/tests/test_bugsummaryrebuild.py	2012-06-25 12:59:54 +0000
@@ -0,0 +1,347 @@
+# Copyright 2012 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+__metaclass__ = type
+
+import subprocess
+
+from testtools.content import text_content
+from testtools.matchers import MatchesRegex
+import transaction
+from zope.component import getUtility
+
+from lp.bugs.interfaces.bugtask import (
+    BugTaskImportance,
+    BugTaskStatus,
+    IBugTaskSet,
+    )
+from lp.bugs.scripts.bugsummaryrebuild import (
+    apply_bugsummary_changes,
+    calculate_bugsummary_changes,
+    calculate_bugsummary_rows,
+    format_target,
+    get_bugsummary_rows,
+    get_bugsummaryjournal_rows,
+    get_bugsummary_targets,
+    get_bugtask_targets,
+    RawBugSummary,
+    rebuild_bugsummary_for_target,
+    )
+from lp.registry.enums import InformationType
+from lp.services.database.lpstorm import IStore
+from lp.services.log.logger import BufferLogger
+from lp.testing import TestCaseWithFactory
+from lp.testing.dbuser import dbuser
+from lp.testing.layers import (
+    LaunchpadZopelessLayer,
+    ZopelessDatabaseLayer,
+    )
+from lp.testing.script import run_script
+
+
+def rollup_journal():
+    IStore(RawBugSummary).execute('SELECT bugsummary_rollup_journal()')
+
+
+def create_tasks(factory):
+    ps = factory.makeProductSeries()
+    product = ps.product
+    sp = factory.makeSourcePackage(publish=True)
+
+    bug = factory.makeBug(product=product)
+    getUtility(IBugTaskSet).createManyTasks(
+        bug, bug.owner, [sp, sp.distribution_sourcepackage, ps])
+
+    # There'll be a target for each task, plus a packageless one for
+    # each package task.
+    expected_targets = [
+        (ps.product.id, None, None, None, None),
+        (None, ps.id, None, None, None),
+        (None, None, sp.distribution.id, None, None),
+        (None, None, sp.distribution.id, None, sp.sourcepackagename.id),
+        (None, None, None, sp.distroseries.id, None),
+        (None, None, None, sp.distroseries.id, sp.sourcepackagename.id)
+        ]
+    return expected_targets
+
+
+class TestBugSummaryRebuild(TestCaseWithFactory):
+
+    layer = ZopelessDatabaseLayer
+
+    def test_get_bugsummary_targets(self):
+        # get_bugsummary_targets returns the set of target tuples that are
+        # currently represented in BugSummary.
+        orig_targets = get_bugsummary_targets()
+        expected_targets = create_tasks(self.factory)
+        rollup_journal()
+        new_targets = get_bugsummary_targets()
+        self.assertContentEqual(expected_targets, new_targets - orig_targets)
+
+    def test_get_bugtask_targets(self):
+        # get_bugtask_targets returns the set of target tuples that are
+        # currently represented in BugTask.
+        orig_targets = get_bugtask_targets()
+        expected_targets = create_tasks(self.factory)
+        new_targets = get_bugtask_targets()
+        self.assertContentEqual(expected_targets, new_targets - orig_targets)
+
+    def test_calculate_bugsummary_changes(self):
+        # calculate_bugsummary_changes returns the changes required
+        # to make the old dict match the new, as a tuple of
+        # (added, updated, removed)
+        changes = calculate_bugsummary_changes(
+            dict(a=2, b=10, c=3), dict(a=2, c=5, d=4))
+        self.assertEqual((dict(d=4), dict(c=5), ['b']), changes)
+
+    def test_apply_bugsummary_changes(self):
+        # apply_bugsummary_changes takes a target and a tuple of changes
+        # from calculate_bugsummary_changes and flushes the changes to
+        # the DB.
+        product = self.factory.makeProduct()
+        self.assertContentEqual([], get_bugsummary_rows(product))
+        NEW = BugTaskStatus.NEW
+        TRIAGED = BugTaskStatus.TRIAGED
+        LOW = BugTaskImportance.LOW
+        HIGH = BugTaskImportance.HIGH
+
+        # Add a couple of rows to start.
+        with dbuser('bugsummaryrebuild'):
+            apply_bugsummary_changes(
+                product,
+                {(NEW, None, HIGH, False, False, None, None): 2,
+                (TRIAGED, None, LOW, False, False, None, None): 4},
+                {}, [])
+        self.assertContentEqual(
+            [(NEW, None, HIGH, False, False, None, None, 2),
+             (TRIAGED, None, LOW, False, False, None, None, 4)],
+            get_bugsummary_rows(product))
+
+        # Delete one, mutate the other.
+        with dbuser('bugsummaryrebuild'):
+            apply_bugsummary_changes(
+                product,
+                {}, {(NEW, None, HIGH, False, False, None, None): 3},
+                [(TRIAGED, None, LOW, False, False, None, None)])
+        self.assertContentEqual(
+            [(NEW, None, HIGH, False, False, None, None, 3)],
+            get_bugsummary_rows(product))
+
+    def test_rebuild_bugsummary_for_target(self):
+        # rebuild_bugsummary_for_target rebuilds BugSummary for a
+        # specific target from BugTaskFlat. Since it ignores the
+        # journal, it also removes any relevant journal entries.
+        product = self.factory.makeProduct()
+        self.factory.makeBug(product=product)
+        self.assertEqual(0, get_bugsummary_rows(product).count())
+        self.assertEqual(1, get_bugsummaryjournal_rows(product).count())
+        log = BufferLogger()
+        with dbuser('bugsummaryrebuild'):
+            rebuild_bugsummary_for_target(product, log)
+        self.assertEqual(1, get_bugsummary_rows(product).count())
+        self.assertEqual(0, get_bugsummaryjournal_rows(product).count())
+        self.assertThat(
+            log.getLogBufferAndClear(),
+            MatchesRegex(
+                'DEBUG Rebuilding %s\nDEBUG Added {.*: 1L}' % product.name))
+
+    def test_script(self):
+        product = self.factory.makeProduct()
+        self.factory.makeBug(product=product)
+        self.assertEqual(0, get_bugsummary_rows(product).count())
+        self.assertEqual(1, get_bugsummaryjournal_rows(product).count())
+        transaction.commit()
+
+        exit_code, out, err = run_script('scripts/bugsummary-rebuild.py')
+        self.addDetail("stdout", text_content(out))
+        self.addDetail("stderr", text_content(err))
+        self.assertEqual(0, exit_code)
+
+        transaction.commit()
+        self.assertEqual(1, get_bugsummary_rows(product).count())
+        self.assertEqual(0, get_bugsummaryjournal_rows(product).count())
+
+
+class TestGetBugSummaryRows(TestCaseWithFactory):
+
+    layer = ZopelessDatabaseLayer
+
+    def test_get_bugsummary_rows(self):
+        product = self.factory.makeProduct()
+        rollup_journal()
+        orig_rows = set(get_bugsummary_rows(product))
+        task = self.factory.makeBug(product=product).default_bugtask
+        rollup_journal()
+        new_rows = set(get_bugsummary_rows(product))
+        self.assertContentEqual(
+            [(task.status, None, task.importance, False, False, None, None,
+              1)],
+            new_rows - orig_rows)
+
+
+class TestCalculateBugSummaryRows(TestCaseWithFactory):
+
+    layer = LaunchpadZopelessLayer
+
+    def test_public_untagged(self):
+        # Public untagged bugs show up in a single row, with both tag
+        # and viewed_by = None.
+        product = self.factory.makeProduct()
+        bug = self.factory.makeBug(product=product).default_bugtask
+        self.assertContentEqual(
+            [(bug.status, None, bug.importance, False, False, None, None, 1)],
+            calculate_bugsummary_rows(product))
+
+    def test_public_tagged(self):
+        # Public tagged bugs show up in a row for each tag, plus an
+        # untagged row.
+        product = self.factory.makeProduct()
+        bug = self.factory.makeBug(
+            product=product, tags=[u'foo', u'bar']).default_bugtask
+        self.assertContentEqual(
+            [(bug.status, None, bug.importance, False, False, None, None, 1),
+             (bug.status, None, bug.importance, False, False, u'foo', None, 1),
+             (bug.status, None, bug.importance, False, False, u'bar', None, 1),
+            ], calculate_bugsummary_rows(product))
+
+    def test_private_untagged(self):
+        # Private untagged bugs show up with tag = None, viewed_by =
+        # subscriber. There's no viewed_by = None row.
+        product = self.factory.makeProduct()
+        owner = self.factory.makePerson()
+        bug = self.factory.makeBug(
+            product=product, owner=owner,
+            information_type=InformationType.USERDATA).default_bugtask
+        self.assertContentEqual(
+            [(bug.status, None, bug.importance, False, False, None, owner.id,
+              1)],
+            calculate_bugsummary_rows(product))
+
+    def test_private_tagged(self):
+        # Private tagged bugs show up with viewed_by = subscriber, with a
+        # row for each tag plus an untagged row.
+        product = self.factory.makeProduct()
+        owner = self.factory.makePerson()
+        bug = self.factory.makeBug(
+            product=product, owner=owner, tags=[u'foo', u'bar'],
+            information_type=InformationType.USERDATA).default_bugtask
+        self.assertContentEqual(
+            [(bug.status, None, bug.importance, False, False, None,
+              owner.id, 1),
+             (bug.status, None, bug.importance, False, False, u'foo',
+              owner.id, 1),
+             (bug.status, None, bug.importance, False, False, u'bar',
+              owner.id, 1)],
+            calculate_bugsummary_rows(product))
+
+    def test_aggregation(self):
+        # Multiple bugs with the same attributes appear in a single
+        # aggregate row with an increased count.
+        product = self.factory.makeProduct()
+        bug1 = self.factory.makeBug(product=product).default_bugtask
+        self.factory.makeBug(product=product).default_bugtask
+        bug3 = self.factory.makeBug(
+            product=product, status=BugTaskStatus.TRIAGED).default_bugtask
+        self.assertContentEqual(
+            [(bug1.status, None, bug1.importance, False, False, None, None, 2),
+             (bug3.status, None, bug3.importance, False, False, None, None, 1),
+            ], calculate_bugsummary_rows(product))
+
+    def test_has_patch(self):
+        # Bugs with a patch attachment (latest_patch_uploaded is not
+        # None) have has_patch=True.
+        product = self.factory.makeProduct()
+        bug1 = self.factory.makeBug(product=product).default_bugtask
+        self.factory.makeBugAttachment(bug=bug1.bug, is_patch=True)
+        bug2 = self.factory.makeBug(
+            product=product, status=BugTaskStatus.TRIAGED).default_bugtask
+        self.assertContentEqual(
+            [(bug1.status, None, bug1.importance, True, False, None, None, 1),
+             (bug2.status, None, bug2.importance, False, False, None, None,
+              1)],
+            calculate_bugsummary_rows(product))
+
+    def test_milestone(self):
+        # Milestoned bugs only show up with the milestone set.
+        product = self.factory.makeProduct()
+        mile1 = self.factory.makeMilestone(product=product)
+        mile2 = self.factory.makeMilestone(product=product)
+        bug1 = self.factory.makeBug(
+            product=product, milestone=mile1).default_bugtask
+        bug2 = self.factory.makeBug(
+            product=product, milestone=mile2,
+            status=BugTaskStatus.TRIAGED).default_bugtask
+        self.assertContentEqual(
+            [(bug1.status, mile1.id, bug1.importance, False, False, None,
+              None, 1),
+             (bug2.status, mile2.id, bug2.importance, False, False, None,
+              None, 1)],
+            calculate_bugsummary_rows(product))
+
+    def test_distribution_includes_packages(self):
+        # Distribution and DistroSeries calculations include their
+        # packages' bugs.
+        dsp = self.factory.makeSourcePackage(
+            publish=True).distribution_sourcepackage
+        sp = self.factory.makeSourcePackage(publish=True)
+        bug1 = self.factory.makeBugTask(target=dsp)
+        bug1.transitionToStatus(BugTaskStatus.INVALID, bug1.owner)
+        bug2 = self.factory.makeBugTask(target=sp)
+        bug1.transitionToStatus(BugTaskStatus.CONFIRMED, bug2.owner)
+
+        # The DistributionSourcePackage task shows up in the
+        # Distribution's rows.
+        self.assertContentEqual(
+            [(bug1.status, None, bug1.importance, False, False, None, None,
+              1)],
+            calculate_bugsummary_rows(dsp.distribution))
+        self.assertContentEqual(
+            calculate_bugsummary_rows(dsp.distribution),
+            calculate_bugsummary_rows(dsp))
+
+        # The SourcePackage task shows up in the DistroSeries' rows.
+        self.assertContentEqual(
+            [(bug2.status, None, bug2.importance, False, False, None, None,
+              1)],
+            calculate_bugsummary_rows(sp.distroseries))
+        self.assertContentEqual(
+            calculate_bugsummary_rows(sp.distroseries),
+            calculate_bugsummary_rows(sp))
+
+
+class TestFormatTarget(TestCaseWithFactory):
+
+    layer = ZopelessDatabaseLayer
+
+    def test_product(self):
+        product = self.factory.makeProduct(name='fooix')
+        self.assertEqual('fooix', format_target(product))
+
+    def test_productseries(self):
+        productseries = self.factory.makeProductSeries(
+            product=self.factory.makeProduct(name='fooix'), name='1.0')
+        self.assertEqual('fooix/1.0', format_target(productseries))
+
+    def test_distribution(self):
+        distribution = self.factory.makeDistribution(name='fooix')
+        self.assertEqual('fooix', format_target(distribution))
+
+    def test_distroseries(self):
+        distroseries = self.factory.makeDistroSeries(
+            distribution=self.factory.makeDistribution(name='fooix'),
+            name='1.0')
+        self.assertEqual('fooix/1.0', format_target(distroseries))
+
+    def test_distributionsourcepackage(self):
+        distribution = self.factory.makeDistribution(name='fooix')
+        dsp = distribution.getSourcePackage(
+            self.factory.makeSourcePackageName('bar'))
+        self.assertEqual('fooix/+source/bar', format_target(dsp))
+
+    def test_sourcepackage(self):
+        distroseries = self.factory.makeDistroSeries(
+            distribution=self.factory.makeDistribution(name='fooix'),
+            name='1.0')
+        sp = distroseries.getSourcePackage(
+            self.factory.makeSourcePackageName('bar'))
+        self.assertEqual('fooix/1.0/+source/bar', format_target(sp))

=== added file 'scripts/bugsummary-rebuild.py'
--- scripts/bugsummary-rebuild.py	1970-01-01 00:00:00 +0000
+++ scripts/bugsummary-rebuild.py	2012-06-25 12:59:54 +0000
@@ -0,0 +1,30 @@
+#!/usr/bin/python -S
+#
+# Copyright 2012 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+import _pythonpath
+
+from lp.bugs.scripts.bugsummaryrebuild import (
+    BugSummaryRebuildTunableLoop,
+    )
+from lp.services.scripts.base import LaunchpadScript
+
+
+class BugSummaryRebuild(LaunchpadScript):
+
+    def add_my_options(self):
+        self.parser.add_option(
+            "-n", "--dry-run", action="store_true",
+            dest="dry_run", default=False,
+            help="Don't commit changes to the DB.")
+
+    def main(self):
+        updater = BugSummaryRebuildTunableLoop(
+            self.logger, self.options.dry_run)
+        updater.run()
+
+if __name__ == '__main__':
+    script = BugSummaryRebuild(
+        'bugsummary-rebuild', dbuser='bugsummaryrebuild')
+    script.lock_and_run()


Follow ups