← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~sinzui/launchpad/revision-karma-1 into lp:launchpad

 

Curtis Hovey has proposed merging lp:~sinzui/launchpad/revision-karma-1 into lp:launchpad.

Commit message:
Make allocate-revision-karma fast.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #1050191 in Launchpad itself: "allocate-revision-karma.py is too slow to complete"
  https://bugs.launchpad.net/launchpad/+bug/1050191

For more details, see:
https://code.launchpad.net/~sinzui/launchpad/revision-karma-1/+merge/137660

allocate-revision-karma.py is having its DB connection reaped while it's
running, because the query in RevisionSet
getRevisionsNeedingKarmaAllocated is terribly slow.

Right now, we have 21.1 million revisions. 15.9 million of these are
flagged as not having karma allocated. These 15.9 million revisions are
the combination of revisions that have never been processed, or only
exist in +junk branches, or only exist in branches owned by invalid
people. Each run, the scanner needs to check all 15.9 million revisions.

--------------------------------------------------------------------

RULES

    Pre-implementation: stub
    * There are a few parts to a fix, both in lp/code/model/revision.py
      * in getRevisionsNeedingKarmaAllocated should simply return
        revisions where karma_allocated IS FALSE.
      * in allocateKarma, self.karma_allocated should always be set
        to True.
      * This changes the existing behavior, so if a revision first ends
        up on the system in a +junk branch or in a branch owned by an
        invalid user (such as ~vcs-imports), it will never have karma
        allocated to it. Lp will not longer give users karma for
        another user's commit.

QA

    * As a webops to run
      ./cronscripts/allocate-revision-karma.py
      for qastaging's script host (gandwana).
    * Verify there are no errors after a 10 minutes. The proc
      can be killed since we do not need to Verify all 15 million
      revisions.


LINT

    lib/lp/code/model/revision.py
    lib/lp/code/model/tests/test_revision.py
    lib/lp/code/scripts/tests/test_revisionkarma.py


TEST

    ./bin/test -vc -t TestRevisionKarma lp.code.model.tests.test_revision
    ./bin/test -vc lp.code.scripts.tests.test_revisionkarma


IMPLEMENTATION


I modified allocateKarma() to always set self.karma_allocated = True and
getRevisionsNeedingKarmaAllocated() to only look for elf.karma_allocated
is False. This simplification also allowed me to remove the dedupe code
from getRevisionsNeedingKarmaAllocated(). I removed the tests for
allocating karma to retargeted junk branches and claimed profiles with
imported revisions because we don't want to support these cases anymore.
All the tests in test_revisionkarma was invalidated, but I wrote 3 tests
to verify the 1 positive case and 2 negative cases that the script
iterates over.
    lib/lp/code/model/revision.py
    lib/lp/code/model/tests/test_revision.py
    lib/lp/code/scripts/tests/test_revisionkarma.py
-- 
https://code.launchpad.net/~sinzui/launchpad/revision-karma-1/+merge/137660
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~sinzui/launchpad/revision-karma-1 into lp:launchpad.
=== modified file 'lib/lp/code/model/revision.py'
--- lib/lp/code/model/revision.py	2012-09-19 01:19:35 +0000
+++ lib/lp/code/model/revision.py	2012-12-03 18:59:36 +0000
@@ -32,7 +32,6 @@
     Asc,
     Desc,
     Join,
-    Not,
     Or,
     Select,
     )
@@ -60,7 +59,6 @@
 from lp.registry.interfaces.person import validate_public_person
 from lp.registry.interfaces.product import IProduct
 from lp.registry.interfaces.projectgroup import IProjectGroup
-from lp.registry.model.person import ValidPersonCache
 from lp.services.database.bulk import create
 from lp.services.database.constants import (
     DEFAULT,
@@ -130,6 +128,9 @@
 
     def allocateKarma(self, branch):
         """See `IRevision`."""
+        # Always set karma_allocated to True so that Lp does not reprocess
+        # junk and invalid user branches because they do not get karma.
+        self.karma_allocated = True
         # If we know who the revision author is, give them karma.
         author = self.revision_author.person
         if author is not None:
@@ -143,8 +144,6 @@
             karma_date = min(self.revision_date, self.date_created)
             karma = branch.target.assignKarma(
                 author, 'revisionadded', karma_date)
-            if karma is not None:
-                self.karma_allocated = True
             return karma
         else:
             return None
@@ -457,23 +456,11 @@
     @staticmethod
     def getRevisionsNeedingKarmaAllocated(limit=None):
         """See `IRevisionSet`."""
-        # Here to stop circular imports.
-        from lp.code.model.branch import Branch
-        from lp.code.model.branchrevision import BranchRevision
-
         store = IStore(Revision)
-        results_with_dupes = store.find(
+        results = store.find(
             Revision,
-            Revision.revision_author == RevisionAuthor.id,
-            RevisionAuthor.person == ValidPersonCache.id,
-            Not(Revision.karma_allocated),
-            BranchRevision.revision == Revision.id,
-            BranchRevision.branch == Branch.id,
-            Or(Branch.product != None, Branch.distroseries != None))[:limit]
-        # Eliminate duplicate rows, returning <= limit rows
-        return store.find(
-            Revision, Revision.id.is_in(
-                results_with_dupes.get_select_expr(Revision.id)))
+            Revision.karma_allocated is False)[:limit]
+        return results
 
     @staticmethod
     def getPublicRevisionsForPerson(person, day_limit=30):

=== modified file 'lib/lp/code/model/tests/test_revision.py'
--- lib/lp/code/model/tests/test_revision.py	2012-10-03 04:52:37 +0000
+++ lib/lp/code/model/tests/test_revision.py	2012-12-03 18:59:36 +0000
@@ -28,7 +28,6 @@
     RevisionSet,
     )
 from lp.registry.model.karma import Karma
-from lp.scripts.garbo import RevisionAuthorEmailLinker
 from lp.services.database.interfaces import (
     DEFAULT_FLAVOR,
     IStoreSelector,
@@ -36,7 +35,6 @@
     )
 from lp.services.database.sqlbase import cursor
 from lp.services.identity.interfaces.account import AccountStatus
-from lp.services.log.logger import DevNullLogger
 from lp.testing import (
     login,
     logout,
@@ -96,7 +94,7 @@
         rev = self.factory.makeRevision()
         branch = self.factory.makeProductBranch()
         branch.createBranchRevision(1, rev)
-        self.failIf(rev.karma_allocated)
+        self.assertTrue(rev.karma_allocated)
 
     def test_noRevisionsNeedingAllocation(self):
         # There are no outstanding revisions needing karma allocated.
@@ -111,7 +109,7 @@
             revision_date=datetime.now(pytz.UTC) - timedelta(days=5))
         branch = self.factory.makeProductBranch()
         branch.createBranchRevision(1, rev)
-        self.failUnless(rev.karma_allocated)
+        self.assertTrue(rev.karma_allocated)
         # Get the karma event.
         [karma] = list(Store.of(author).find(
             Karma,
@@ -125,14 +123,15 @@
 
     def test_karmaNotAllocatedForKnownAuthorWithInactiveAccount(self):
         # If the revision author is known, but the account is not active,
-        # don't allocate karma.
+        # don't allocate karma, and record that karma_allocated was done.
         author = self.factory.makePerson()
         rev = self.factory.makeRevision(
             author=author.preferredemail.email)
         author.account.status = AccountStatus.SUSPENDED
         branch = self.factory.makeProductBranch()
         branch.createBranchRevision(1, rev)
-        self.failIf(rev.karma_allocated)
+        self.assertTrue(rev.karma_allocated)
+        self.assertEqual(0, rev.revision_author.person.karma)
         # Even though the revision author is connected to the person, since
         # the account status is suspended, the person is not "valid", and so
         # the revisions are not returned as needing karma allocated.
@@ -140,66 +139,19 @@
             [], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
 
     def test_noKarmaForJunk(self):
-        # Revisions only associated with junk branches don't get karma.
+        # Revisions only associated with junk branches don't get karma,
+        # and Lp records that karma_allocated was done.
         author = self.factory.makePerson()
         rev = self.factory.makeRevision(
             author=author.preferredemail.email)
         branch = self.factory.makePersonalBranch()
         branch.createBranchRevision(1, rev)
-        self.failIf(rev.karma_allocated)
+        self.assertTrue(rev.karma_allocated)
+        self.assertEqual(0, rev.revision_author.person.karma)
         # Nor is this revision identified as needing karma allocated.
         self.assertEqual(
             [], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
 
-    def test_junkBranchMovedToProductNeedsKarma(self):
-        # A junk branch that moves to a product needs karma allocated.
-        author = self.factory.makePerson()
-        rev = self.factory.makeRevision(author=author)
-        branch = self.factory.makePersonalBranch()
-        branch.createBranchRevision(1, rev)
-        # Once the branch is connected to the revision, we now specify
-        # a product for the branch.
-        project = self.factory.makeProduct()
-        branch.setTarget(user=branch.owner, project=project)
-        # The revision is now identified as needing karma allocated.
-        self.assertEqual(
-            [rev], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
-
-    def test_junkBranchMovedToPackageNeedsKarma(self):
-        # A junk branch that moves to a package needs karma allocated.
-        author = self.factory.makePerson()
-        rev = self.factory.makeRevision(author=author)
-        branch = self.factory.makePersonalBranch()
-        branch.createBranchRevision(1, rev)
-        # Once the branch is connected to the revision, we now specify
-        # a product for the branch.
-        source_package = self.factory.makeSourcePackage()
-        branch.setTarget(user=branch.owner, source_package=source_package)
-        # The revision is now identified as needing karma allocated.
-        self.assertEqual(
-            [rev], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
-
-    def test_newRevisionAuthorLinkNeedsKarma(self):
-        # If Launchpad knows of revisions by a particular author, and later
-        # that authoer registers with launchpad, the revisions need karma
-        # allocated.
-        email = self.factory.getUniqueEmailAddress()
-        rev = self.factory.makeRevision(author=email)
-        branch = self.factory.makeProductBranch()
-        branch.createBranchRevision(1, rev)
-        self.failIf(rev.karma_allocated)
-        # Since the revision author is not known, the revisions do not at this
-        # stage need karma allocated.
-        self.assertEqual(
-            [], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
-        # The person registers with Launchpad.
-        self.factory.makePerson(email=email)
-        # Garbo runs the RevisionAuthorEmailLinker job.
-        RevisionAuthorEmailLinker(log=DevNullLogger()).run()
-        # Now the kama needs allocating.
-        self.assertEqual(
-            [rev], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
-
     def test_karmaDateForFutureRevisions(self):
         # If the revision date is some time in the future, then the karma date
         # is set to be the time that the revision was created.

=== modified file 'lib/lp/code/scripts/tests/test_revisionkarma.py'
--- lib/lp/code/scripts/tests/test_revisionkarma.py	2012-01-20 15:42:44 +0000
+++ lib/lp/code/scripts/tests/test_revisionkarma.py	2012-12-03 18:59:36 +0000
@@ -8,13 +8,9 @@
 from storm.store import Store
 import transaction
 
-from lp.code.model.revision import RevisionSet
 from lp.code.scripts.revisionkarma import RevisionKarmaAllocator
 from lp.registry.model.karma import Karma
-from lp.scripts.garbo import RevisionAuthorEmailLinker
 from lp.services.config import config
-from lp.services.identity.model.emailaddress import EmailAddressSet
-from lp.services.log.logger import DevNullLogger
 from lp.testing import TestCaseWithFactory
 from lp.testing.dbuser import switch_dbuser
 from lp.testing.layers import LaunchpadZopelessLayer
@@ -25,99 +21,47 @@
 
     layer = LaunchpadZopelessLayer
 
-    def setUp(self):
-        # Use an administrator for the factory
-        TestCaseWithFactory.setUp(self, 'admin@xxxxxxxxxxxxx')
+    def runScript(self):
+        transaction.commit()
+        switch_dbuser(config.revisionkarma.dbuser)
+        script = RevisionKarmaAllocator(
+            'test', config.revisionkarma.dbuser, ['-q'])
+        script.main()
 
-    def assertOneKarmaEvent(self, person, product):
-        # Make sure there is one and only one karma event for the person and
-        # product.
-        result = Store.of(person).find(
+    def assertKarmaEvent(self, person, product, count):
+        # Make sure the count of karma events matches the script iteration
+        # over revision.allocateKarma()
+        instance = person or product
+        result = Store.of(instance).find(
             Karma,
             Karma.person == person,
             Karma.product == product)
-        self.assertEqual(1, result.count())
-
-    def test_junkBranchMoved(self):
-        # When a junk branch is moved to a product, the revision author will
-        # get karma on the product.
-        author = self.factory.makePerson()
-        rev = self.factory.makeRevision(
-            author=author.preferredemail.email)
+        self.assertEqual(count, result.count())
+
+    def test_branch_allocated_karma(self):
+        # Revision authors that are Lp users are awarded karma for non-junk
+        # branches.
+        author = self.factory.makePerson()
+        rev = self.factory.makeRevision(author=author.preferredemail.email)
+        branch = self.factory.makeBranch()
+        branch.createBranchRevision(1, rev)
+        self.assertTrue(rev.karma_allocated)
+        self.assertKarmaEvent(author, branch.product, 1)
+
+    def test_junk_branch_not_allocated_karma(self):
+        # Revision authors do not get karma for junk branches.
+        author = self.factory.makePerson()
+        rev = self.factory.makeRevision(author=author.preferredemail.email)
         branch = self.factory.makePersonalBranch()
         branch.createBranchRevision(1, rev)
-        # Once the branch is connected to the revision, we now specify
-        # a product for the branch.
-        project = self.factory.makeProduct()
-        branch.setTarget(user=branch.owner, project=project)
-        # Commit and switch to the script db user.
-        transaction.commit()
-        switch_dbuser(config.revisionkarma.dbuser)
-        script = RevisionKarmaAllocator(
-            'test', config.revisionkarma.dbuser, ['-q'])
-        script.main()
-        self.assertOneKarmaEvent(author, branch.product)
+        self.assertTrue(rev.karma_allocated)
+        self.assertKarmaEvent(author, branch.product, 0)
 
-    def test_newRevisionAuthor(self):
-        # When a user validates an email address that is part of a revision
-        # author, and that author has revisions associated with a product, we
-        # give the karma to the user.
+    def test_unknown_revision_author_not_allocated_karma(self):
+        # Karma is not allocated when the revision author is not an Lp user.
         email = self.factory.getUniqueEmailAddress()
         rev = self.factory.makeRevision(author=email)
         branch = self.factory.makeAnyBranch()
         branch.createBranchRevision(1, rev)
-        transaction.commit()
-        self.failIf(rev.karma_allocated)
-        # Since the revision author is not known, the revisions do not at this
-        # stage need karma allocated.
-        self.assertEqual(
-            [], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
-        # The person registers with Launchpad.
-        author = self.factory.makePerson(email=email)
-        transaction.commit()
-        # Run the RevisionAuthorEmailLinker garbo job.
-        RevisionAuthorEmailLinker(log=DevNullLogger()).run()
-        switch_dbuser(config.revisionkarma.dbuser)
-        script = RevisionKarmaAllocator(
-            'test', config.revisionkarma.dbuser, ['-q'])
-        script.main()
-        self.assertOneKarmaEvent(author, branch.product)
-
-    def test_ownerJunkBranchWithAnotherProductBranch(self):
-        # If the revision author has the revision in a junk branch but someone
-        # else has the revision in the ancestry of a branch associated with a
-        # product, then we use the branch with the product rather than the
-        # junk branch owned by the revision author.
-        author = self.factory.makePerson()
-        email = self.factory.getUniqueEmailAddress()
-        rev = self.factory.makeRevision(author=email)
-        branch = self.factory.makePersonalBranch(owner=author)
-        branch.createBranchRevision(1, rev)
-        self.failIf(rev.karma_allocated)
-        # Now we have a junk branch which has a revision with an email address
-        # that is not yet claimed by the author.
-
-        # Now create a non junk branch owned by someone else that has the
-        # revision.
-        b2 = self.factory.makeProductBranch()
-        # Put the author's revision in the ancestry.
-        b2.createBranchRevision(None, rev)
-
-        # Now link the revision author to the author.
-        author.validateAndEnsurePreferredEmail(
-            EmailAddressSet().new(email, author))
-        transaction.commit()
-        # Run the RevisionAuthorEmailLinker garbo job.
-        RevisionAuthorEmailLinker(log=DevNullLogger()).run()
-
-        # Now that the revision author is linked to the person, the revision
-        # needs karma allocated.
-        self.assertEqual(
-            [rev], list(RevisionSet.getRevisionsNeedingKarmaAllocated()))
-
-        # Commit and switch to the script db user.
-        switch_dbuser(config.revisionkarma.dbuser)
-        script = RevisionKarmaAllocator(
-            'test', config.revisionkarma.dbuser, ['-q'])
-        script.main()
-        self.assertOneKarmaEvent(author, b2.product)
+        self.assertTrue(rev.karma_allocated)
+        self.assertKarmaEvent(rev.revision_author.person, branch.product, 0)


Follow ups