← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~wgrant/launchpad/translationsplitter-no-manual into lp:launchpad

 

William Grant has proposed merging lp:~wgrant/launchpad/translationsplitter-no-manual into lp:launchpad.

Commit message:
Replace most of TranslationTemplateSplitter's horror with a call into POTemplateSet.getSharingSubset. One big duplicated set of rules gone, and we get a simpler and faster slow query.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~wgrant/launchpad/translationsplitter-no-manual/+merge/231825

TranslationTemplateSplitter.findShared's main query finds the other templates containing each of the template's shared POTMsgSets and then checks whether each template satisfies the sharing set's conditions, in one big complex query. This means the query duplicates POTemplateSharingSubset's knowledge of sharing rules, and adds complexity to a query with potentially perilous performance characteristics.

Fundamentally we just need to find all the TranslationTemplateItems in the interesting template, then identify all the TTIs referencing the same POTMsgSets in templates outside the sharing subset.

The sharing subset is rarely going to be more than a few tens of templates (well under 1% of the set of all templates), and the number of TTIs to search in the second step probably never exceeds 100x the initial set. It's not sensible to attempt to meet in the middle, as we'd have to search for all the relevant POTMsgSets in the 99% of templates that we want to split out. So we have to join and filter in order, from the original template through to its TTIs through to the other TTIs, then filter those TTIs based on whether their template is in the sharing subset.

But we know that the sharing subset is reasonably small, and it's quick and relatively constant to calculate, so we can materialise the valid template IDs and pass them into the TTI query. This lets the big, slow query be satisfied with two index scans over just one table: find the TTIs in the initial template, find other TTIs with the same POTMsgSets, then filter out any that have a POTemplate in the known set.

It turns out that my logic matches postgres' query planner. Even for the existing query it precalculates the set of invalid IDs and does what I suggest. But the new query saves a lot of CPU overhead; on a huge template like ddtp-ubuntu-universe (~90000 strings) this takes the hot query down from 850ms to 500ms on dogfood.
-- 
https://code.launchpad.net/~wgrant/launchpad/translationsplitter-no-manual/+merge/231825
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~wgrant/launchpad/translationsplitter-no-manual into lp:launchpad.
=== modified file 'lib/lp/translations/utilities/translationsplitter.py'
--- lib/lp/translations/utilities/translationsplitter.py	2011-08-01 07:25:17 +0000
+++ lib/lp/translations/utilities/translationsplitter.py	2014-08-22 02:19:53 +0000
@@ -1,4 +1,4 @@
-# Copyright 2011 Canonical Ltd.  This software is licensed under the
+# Copyright 2011-2014 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 __metaclass__ = type
@@ -6,22 +6,15 @@
 
 import logging
 
-from storm.expr import (
-    And,
-    Join,
-    LeftJoin,
-    Not,
-    Or,
-    )
+from storm.expr import Not
 from storm.locals import (
     ClassAlias,
     Store,
     )
 import transaction
+from zope.component import getUtility
 
-from lp.registry.model.distroseries import DistroSeries
-from lp.registry.model.packaging import Packaging
-from lp.registry.model.productseries import ProductSeries
+from lp.translations.interfaces.potemplate import IPOTemplateSet
 from lp.translations.model.potemplate import POTemplate
 from lp.translations.model.translationtemplateitem import (
     TranslationTemplateItem,
@@ -138,79 +131,18 @@
         Only return those that are shared but shouldn't be because they
         are now in non-sharing templates.
         """
-        store = Store.of(self.potemplate)
+        sharing_subset = getUtility(IPOTemplateSet).getSharingSubset(
+            product=self.potemplate.product,
+            distribution=self.potemplate.distribution,
+            sourcepackagename=self.potemplate.sourcepackagename)
+        sharing_ids = list(
+            sharing_subset.getSharingPOTemplateIDs(self.potemplate.name))
+
         ThisItem = ClassAlias(TranslationTemplateItem, 'ThisItem')
         OtherItem = ClassAlias(TranslationTemplateItem, 'OtherItem')
-        OtherTemplate = ClassAlias(POTemplate, 'OtherTemplate')
-
-        tables = [
-            OtherTemplate,
-            Join(OtherItem, OtherItem.potemplateID == OtherTemplate.id),
-            Join(ThisItem,
-                 And(ThisItem.potmsgsetID == OtherItem.potmsgsetID,
-                     ThisItem.potemplateID == self.potemplate.id)),
-            ]
-
-        if self.potemplate.productseries is not None:
-            # If the template is now in a product, we look for all
-            # effectively sharing templates that are in *different*
-            # products, or that are in a sourcepackage which is not
-            # linked (through Packaging table) with this product.
-            ps = self.potemplate.productseries
-            productseries_join = LeftJoin(
-                ProductSeries,
-                ProductSeries.id == OtherTemplate.productseriesID)
-            packaging_join = LeftJoin(
-                Packaging,
-                And(Packaging.productseriesID == ps.id,
-                    (Packaging.sourcepackagenameID ==
-                     OtherTemplate.sourcepackagenameID),
-                    Packaging.distroseriesID == OtherTemplate.distroseriesID
-                    ))
-            tables.extend([productseries_join, packaging_join])
-            # Template should not be sharing if...
-            other_clauses = Or(
-                # The name is different, or...
-                OtherTemplate.name != self.potemplate.name,
-                # It's in a different product, or...
-                And(Not(ProductSeries.id == None),
-                    ProductSeries.productID != ps.productID),
-                # There is no link between this product series and
-                # a source package the template is in.
-                And(Not(OtherTemplate.distroseriesID == None),
-                    Packaging.id == None))
-        else:
-            # If the template is now in a source package, we look for all
-            # effectively sharing templates that are in *different*
-            # distributions or source packages, or that are in a product
-            # which is not linked with this source package.
-            ds = self.potemplate.distroseries
-            spn = self.potemplate.sourcepackagename
-            distroseries_join = LeftJoin(
-                DistroSeries,
-                DistroSeries.id == OtherTemplate.distroseriesID)
-            packaging_join = LeftJoin(
-                Packaging,
-                And(Packaging.distroseriesID == ds.id,
-                    Packaging.sourcepackagenameID == spn.id,
-                    Packaging.productseriesID == OtherTemplate.productseriesID
-                    ))
-            tables.extend([distroseries_join, packaging_join])
-            # Template should not be sharing if...
-            other_clauses = Or(
-                # The name is different, or...
-                OtherTemplate.name != self.potemplate.name,
-                # It's in a different distribution or source package, or...
-                And(Not(DistroSeries.id == None),
-                    Or(DistroSeries.distributionID != ds.distributionID,
-                       OtherTemplate.sourcepackagenameID != spn.id)),
-                # There is no link between this source package and
-                # a product the template is in.
-                And(Not(OtherTemplate.productseriesID == None),
-                    Packaging.id == None))
-
-        return store.using(*tables).find(
+        return Store.of(self.potemplate).find(
             (OtherItem, ThisItem),
-            OtherTemplate.id != self.potemplate.id,
-            other_clauses,
+            ThisItem.potemplateID == self.potemplate.id,
+            OtherItem.potmsgsetID == ThisItem.potmsgsetID,
+            Not(OtherItem.potemplateID.is_in(sharing_ids)),
             )


Follow ups