← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~jtv/launchpad/bug-890542 into lp:launchpad

 

Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/bug-890542 into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #890542 in Launchpad itself: "BPPH.binarypackagename/SPPH.sourcepackagename populators are slow"
  https://bugs.launchpad.net/launchpad/+bug/890542

For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-890542/+merge/82241

= Summary =

We have high optimiation hopes for two denormalized fields, BinaryPackagepublishingHistory.binarypackagename and SourcePackagepublishingHistory.sourcepackagename.

But we can't use them yet: they have not been fully populated.  The garbo code that does this is taking a long time to finish the job.
 

== Proposed fix ==

Have more of the work done in SQL, so that less data makes the round trip between the database and the script instance.  Do the work in bulk, so as to eliminate the numerous single-object queries hidden in the ORM.


== Pre-implementation notes ==

Discussed briefly with StevenK.


== Implementation details ==

You may wonder why I'm still loading the ids of the records being updated into the client-side python world.  There's a cost to this, compared to collecting these ids in a nested query.

The reason is that with a nested query, the query planner needs to estimate up-front how many rows are going to come out of the nested query, before planning the combined inner and outer queries.  By retrieving the ids in a separate step, we force the planner to consider this based on exact information.


== Tests ==

{{{
./bin/test -vvc lp.scripts.tests.test_garbo -t test_SPPH_and_BPPH_populator
}}}


== Demo and Q/A ==

Run garbo against staging or dogfood (whichever has [BS]PPH records where the field has not been initialized yet) and observe that:

1. The updated script updates records at a good clip.

2. For all records that have the field initialized, it's equal to the matching release's package-name field.


= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/scripts/garbo.py
-- 
https://code.launchpad.net/~jtv/launchpad/bug-890542/+merge/82241
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/launchpad/bug-890542 into lp:launchpad.
=== modified file 'lib/lp/scripts/garbo.py'
--- lib/lp/scripts/garbo.py	2011-10-27 11:36:13 +0000
+++ lib/lp/scripts/garbo.py	2011-11-15 05:25:31 +0000
@@ -972,11 +972,14 @@
     done = False
     maximum_chunk_size = 5000
 
+    SPPH = SourcePackagePublishingHistory
+
+    def getStore(self):
+        return IMasterStore(self.SPPH)
+
     def findSPPHs(self):
-        return IMasterStore(SourcePackagePublishingHistory).find(
-            SourcePackagePublishingHistory,
-            SourcePackagePublishingHistory.sourcepackagename == None
-            ).order_by(SourcePackagePublishingHistory.id)
+        SPPH = self.SPPH
+        return self.getStore().find(SPPH.id, SPPH.sourcepackagename == None)
 
     def isDone(self):
         """See `TunableLoop`."""
@@ -984,12 +987,21 @@
 
     def __call__(self, chunk_size):
         """See `TunableLoop`."""
-        spphs = self.findSPPHs()[:chunk_size]
-        for spph in spphs:
-            spph.sourcepackagename = (
-                spph.sourcepackagerelease.sourcepackagename)
+        spphs = list(self.findSPPHs()[:chunk_size])
+        if len(spphs) == 0:
+            self.done = True
+            return
+        self.getStore().execute("""
+            UPDATE SourcePackagePublishingHistory AS SPPH
+            SET sourcepackagename = SPR.sourcepackagename
+            FROM SourcePackageRelease AS SPR
+            WHERE
+                SPR.id = SPPH.sourcepackagerelease AND
+                SPPH.sourcepackagename IS NULL AND
+                SPPH.id IN %s
+            """ % sqlvalues(spphs))
         transaction.commit()
-        self.done = self.findSPPHs().is_empty()
+        self.done = (len(spphs) == 0)
 
 
 # XXX: StevenK 2011-09-14 bug=849683: This can be removed when done.
@@ -999,11 +1011,14 @@
     done = False
     maximum_chunk_size = 5000
 
+    BPPH = BinaryPackagePublishingHistory
+
+    def getStore(self):
+        return IMasterStore(self.BPPH)
+
     def findBPPHs(self):
-        return IMasterStore(BinaryPackagePublishingHistory).find(
-            BinaryPackagePublishingHistory,
-            BinaryPackagePublishingHistory.binarypackagename == None
-            ).order_by(BinaryPackagePublishingHistory.id)
+        BPPH = self.BPPH
+        return self.getStore().find(BPPH.id, BPPH.binarypackagename == None)
 
     def isDone(self):
         """See `TunableLoop`."""
@@ -1011,12 +1026,20 @@
 
     def __call__(self, chunk_size):
         """See `TunableLoop`."""
-        bpphs = self.findBPPHs()[:chunk_size]
-        for bpph in bpphs:
-            bpph.binarypackagename = (
-                bpph.binarypackagerelease.binarypackagename)
+        bpphs = list(self.findBPPHs()[:chunk_size])
+        if len(bpphs) == 0:
+            self.done = True
+            return
+        self.getStore().execute("""
+            UPDATE BinaryPackagePublishingHistory AS BPPH
+            SET binarypackagename = BPR.binarypackagename
+            FROM BinaryPackageRelease AS BPR
+            WHERE
+                BPR.id = BPPH.binarypackagerelease AND
+                BPPH.binarypackagename IS NULL AND
+                BPPH.id IN %s
+            """ % sqlvalues(bpphs))
         transaction.commit()
-        self.done = self.findBPPHs().is_empty()
 
 
 class BaseDatabaseGarbageCollector(LaunchpadCronScript):