← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~jtv/launchpad/bug-994650-scrub-pofiletranslator into lp:launchpad

 

Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/bug-994650-scrub-pofiletranslator into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #994650 in Launchpad itself: "POFileTranslator falls out of date"
  https://bugs.launchpad.net/launchpad/+bug/994650

For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-994650-scrub-pofiletranslator/+merge/104866

This is as discussed on the internal mailing list.  In a series of branches that simplifies and streamlines POFileTranslator management, this is the only one that introduces new code.  It won't run while we still have POFileTranslator.latest_message, which I'm dropping in a separate pair of branches (devel/db-devel).

POFileTranslator is described as an “eager materialized view.”  It's a cache of who contributed TranslationMessages to which POFiles, and when they last did.

What you see in this branch is a scrubbing script for POFileTranslator.  Instead of trying (and in many ways failing) to maintain that table synchronously using a long and complex database trigger, Launchpad will keep it “eventually consistent” by running this script.  The script doesn't care how data got where it is, so some painfully complex script code that updates the table can disappear.  Instead, the script just finds out which entries are missing and which ones need adding, and does that.

The script may still need optimizing.  I deliberately kept it simple.  In the current code it would have been easy to check for and correct timestamps on the contributions as well, but I think that's of secondary importantance.  Where we have a justified POFileTranslator record, its timestamp is likely to be correct; the known problems involve records being omitted altogether.  And simplifications I'm introducing elsewhere may also cause records to be omitted, or left in place when they should be moved to different POFiles, or deleted.
-- 
https://code.launchpad.net/~jtv/launchpad/bug-994650-scrub-pofiletranslator/+merge/104866
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/launchpad/bug-994650-scrub-pofiletranslator into lp:launchpad.
=== added file 'cronscripts/scrub-pofiletranslator.py'
--- cronscripts/scrub-pofiletranslator.py	1970-01-01 00:00:00 +0000
+++ cronscripts/scrub-pofiletranslator.py	2012-05-07 04:13:24 +0000
@@ -0,0 +1,21 @@
+#!/usr/bin/python -S
+#
+# Copyright 2012 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Update `POFileTranslator` table."""
+
+import _pythonpath
+
+from lp.translations.scripts.scrub_pofiletranslator import (
+    ScrubPOFileTranslator,
+    )
+
+
+__metaclass__ = type
+
+
+if __name__ == '__main__':
+    script = ScrubPOFileTranslator(
+        'scrub-pofiletranslator', 'scrub_pofiletranslator')
+    script.lock_and_run()

=== modified file 'database/schema/security.cfg'
--- database/schema/security.cfg	2012-05-04 13:12:41 +0000
+++ database/schema/security.cfg	2012-05-07 04:13:24 +0000
@@ -475,6 +475,18 @@
 public.productseries   = SELECT
 type=user
 
+[scrub_pofiletranslator]
+type=user
+groups=script
+public.language                         = SELECT
+public.person                           = SELECT
+public.pofile                           = SELECT
+public.pofiletranslator                 = SELECT, DELETE, INSERT, UPDATE
+public.potemplate                       = SELECT
+public.potmsgset                        = SELECT
+public.translationmessage               = SELECT
+public.translationtemplateitem          = SELECT
+
 [poimport]
 groups=write,script
 public.account                          = SELECT, INSERT

=== added file 'lib/lp/translations/scripts/scrub_pofiletranslator.py'
--- lib/lp/translations/scripts/scrub_pofiletranslator.py	1970-01-01 00:00:00 +0000
+++ lib/lp/translations/scripts/scrub_pofiletranslator.py	2012-05-07 04:13:24 +0000
@@ -0,0 +1,121 @@
+# Copyright 2012 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Keep `POFileTranslator` more or less consistent with the real data."""
+
+__metaclass__ = type
+__all__ = [
+    'ScrubPOFileTranslator',
+    ]
+
+from storm.expr import (
+    Coalesce,
+    Desc,
+    )
+import transaction
+
+from lp.services.database.lpstorm import IStore
+from lp.services.scripts.base import LaunchpadCronScript
+from lp.translations.model.pofile import POFile
+from lp.translations.model.pofiletranslator import POFileTranslator
+from lp.translations.model.potemplate import POTemplate
+from lp.translations.model.translationmessage import TranslationMessage
+from lp.translations.model.translationtemplateitem import (
+    TranslationTemplateItem,
+    )
+
+
+class ScrubPOFileTranslator(LaunchpadCronScript):
+    """Update `POFileTranslator` to reflect current translations."""
+
+    def get_pofiles(self):
+        """Retrieve POFiles to scrub.
+
+        The result's ordering is aimed at maximizing cache effectiveness:
+        by POTemplate name for locality of shared POTMsgSets, and by language
+        for locality of shared TranslationMessages.
+        """
+        store = IStore(POFile)
+        query = store.find(
+            POFile,
+            POFile.potemplateID == POTemplate.id,
+            POTemplate.iscurrent == True)
+        return query.order_by(POTemplate.name, POFile.languageID)
+
+    def get_contributions(self, pofile):
+        """Map all users' most recent contributions to `pofile`.
+
+        Returns a dict mapping `Person` id to the creation time of their most
+        recent `TranslationMessage` in `POFile`.
+        """
+        store = IStore(pofile)
+        potmsgset_ids = store.find(
+            TranslationTemplateItem.potmsgsetID,
+            TranslationTemplateItem.potemplateID == pofile.potemplate.id,
+            TranslationTemplateItem.sequence > 0)
+        contribs = store.find(
+            (TranslationMessage.submitterID, TranslationMessage.date_created),
+            TranslationMessage.potmsgsetID.is_in(potmsgset_ids),
+            TranslationMessage.languageID == pofile.language.id,
+            TranslationMessage.msgstr0 != None,
+            Coalesce(
+                TranslationMessage.potemplateID,
+                pofile.potemplate.id) == pofile.potemplate.id)
+        contribs = contribs.config(distinct=(TranslationMessage.submitterID,))
+        contribs = contribs.order_by(
+            TranslationMessage.submitterID,
+            Desc(TranslationMessage.date_created))
+        return dict(contribs)
+
+    def get_pofiletranslators(self, pofile):
+        """Get `POFileTranslator` entries for `pofile`.
+
+        Returns a dict mapping each contributor's person id to their
+        `POFileTranslator` record.
+        """
+        store = IStore(pofile)
+        pofts = store.find(
+            POFileTranslator, POFileTranslator.pofileID == pofile.id)
+        return {poft.personID: poft for poft in pofts}
+
+    def remove_pofiletranslators(self, pofile, person_ids):
+        """Delete `POFileTranslator` records."""
+        store = IStore(pofile)
+        pofts = store.find(
+            POFileTranslator,
+            POFileTranslator.pofileID == pofile.id,
+            POFileTranslator.personID.is_in(person_ids))
+        pofts.remove()
+
+    def remove_unwarranted_pofiletranslators(self, pofile, pofts, contribs):
+        """Delete `POFileTranslator` records that shouldn't be there."""
+        excess = set(pofts) - set(contribs)
+        if len(excess) > 0:
+            self.remove_pofiletranslators(pofile, excess)
+
+    def create_missing_pofiletranslators(self, pofile, pofts, contribs):
+        """Create `POFileTranslator` records that were missing."""
+        shortage = set(contribs) - set(pofts)
+        store = IStore(pofile)
+        for missing_contributor in shortage:
+            store.add(POFileTranslator(
+                pofile=pofile, personID=missing_contributor,
+                date_last_touched=contribs[missing_contributor]))
+
+    def scrub_pofile(self, pofile):
+        """Scrub `POFileTranslator` entries for one `POFile`.
+
+        Removes inappropriate entries and adds missing ones.
+        """
+        contribs = self.get_contributions(pofile)
+        pofiletranslators = self.get_pofiletranslators(pofile)
+        self.remove_unwarranted_pofiletranslators(
+            pofile, pofiletranslators, contribs)
+        self.create_missing_pofiletranslators(
+            pofile, pofiletranslators, contribs)
+
+    def main(self):
+        """See `LaunchpadScript`."""
+        for pofile in self.get_pofiles():
+            self.scrub_pofile(pofile)
+            transaction.commit()

=== added file 'lib/lp/translations/scripts/tests/test_scrub_pofiletranslator.py'
--- lib/lp/translations/scripts/tests/test_scrub_pofiletranslator.py	1970-01-01 00:00:00 +0000
+++ lib/lp/translations/scripts/tests/test_scrub_pofiletranslator.py	2012-05-07 04:13:24 +0000
@@ -0,0 +1,214 @@
+# Copyright 2012 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Test scrubbing of `POFileTranslator`."""
+
+__metaclass__ = type
+
+from datetime import (
+    datetime,
+    timedelta,
+    )
+
+import pytz
+import transaction
+
+from lp.services.database.constants import UTC_NOW
+from lp.services.database.lpstorm import IStore
+from lp.services.scripts.tests import run_script
+from lp.testing import TestCaseWithFactory
+from lp.testing.layers import ZopelessDatabaseLayer
+from lp.translations.model.pofiletranslator import POFileTranslator
+from lp.translations.scripts.scrub_pofiletranslator import (
+    ScrubPOFileTranslator,
+    )
+
+
+def measure_distance(sequence, item1, item2):
+    """Return the distance between items in a sequence."""
+    container = list(sequence)
+    return container.index(item2) - container.index(item1)
+
+
+class TestScrubPOFileTranslator(TestCaseWithFactory):
+
+    layer = ZopelessDatabaseLayer
+
+    def make_script(self):
+        return ScrubPOFileTranslator(test_args=[])
+
+    def query_pofiletranslator(self, pofile, person):
+        """Query `POFileTranslator` for a specific record.
+
+        :return: Storm result set.
+        """
+        store = IStore(pofile)
+        return store.find(POFileTranslator, pofile=pofile, person=person)
+
+    def make_message_with_pofiletranslator(self, pofile=None):
+        """Create a normal `TranslationMessage` with `POFileTranslator`."""
+        if pofile is None:
+            pofile = self.factory.makePOFile()
+        potmsgset = self.factory.makePOTMsgSet(
+            potemplate=pofile.potemplate, sequence=1)
+        # A database trigger on TranslationMessage automatically creates
+        # a POFileTranslator record for each new TranslationMessage.
+        return self.factory.makeSuggestion(pofile=pofile, potmsgset=potmsgset)
+
+    def make_message_without_pofiletranslator(self, pofile=None):
+        """Create a `TranslationMessage` without `POFileTranslator`."""
+        tm = self.make_message_with_pofiletranslator(pofile)
+        IStore(pofile).flush()
+        self.becomeDbUser('postgres')
+        self.query_pofiletranslator(pofile, tm.submitter).remove()
+        return tm
+
+    def make_pofiletranslator_without_message(self, pofile=None):
+        """Create a `POFileTranslator` without `TranslationMessage`."""
+        if pofile is None:
+            pofile = self.factory.makePOFile()
+        poft = POFileTranslator(
+            pofile=pofile, person=self.factory.makePerson(),
+            date_last_touched=UTC_NOW)
+        IStore(poft.pofile).add(poft)
+        return poft
+
+    def test_get_pofiles_gets_pofiles_for_active_templates(self):
+        pofile = self.factory.makePOFile()
+        self.assertIn(pofile, self.make_script().get_pofiles())
+
+    def test_get_pofiles_skips_inactive_templates(self):
+        pofile = self.factory.makePOFile()
+        pofile.potemplate.iscurrent = False
+        self.assertNotIn(pofile, self.make_script().get_pofiles())
+
+    def test_get_pofiles_clusters_by_template_name(self):
+        # POFiles for templates with the same name are bunched together
+        # in the get_pofiles() output.
+        templates = [
+            self.factory.makePOTemplate(name='shared'),
+            self.factory.makePOTemplate(name='other'),
+            self.factory.makePOTemplate(name='andanother'),
+            self.factory.makePOTemplate(
+                name='shared', distroseries=self.factory.makeDistroSeries()),
+            ]
+        pofiles = [
+            self.factory.makePOFile(potemplate=template)
+            for template in templates]
+        ordering = self.make_script().get_pofiles()
+        self.assertEqual(
+            1, abs(measure_distance(ordering, pofiles[0], pofiles[-1])))
+
+    def test_get_pofiles_clusters_by_language(self):
+        # POFiles for sharing templates and the same language are
+        # bunched together in the get_pofiles() output.
+        templates = [
+            self.factory.makePOTemplate(
+                name='shared', distroseries=self.factory.makeDistroSeries())
+            for counter in range(2)]
+        # POFiles per language & template.  We create these in a strange
+        # way to avoid the risk of mistaking accidental orderings such
+        # as per-id from being mistaken for the proper order.
+        pofiles_per_language = {language: [] for language in ['nl', 'fr']}
+        for language, pofiles in pofiles_per_language.items():
+            for template in templates:
+                pofiles.append(
+                    self.factory.makePOFile(language, potemplate=template))
+
+        ordering = self.make_script().get_pofiles()
+        for pofiles in pofiles_per_language.values():
+            self.assertEqual(
+                1, abs(measure_distance(ordering, pofiles[0], pofiles[1])))
+
+    def test_get_contributions_gets_contributions(self):
+        pofile = self.factory.makePOFile()
+        tm = self.factory.makeSuggestion(pofile=pofile)
+        self.assertEqual(
+            {tm.submitter.id: tm.date_created},
+            self.make_script().get_contributions(pofile))
+
+    def test_get_contributions_uses_latest_contribution(self):
+        pofile = self.factory.makePOFile()
+        today = datetime.now(pytz.UTC)
+        yesterday = today - timedelta(1, 1, 1)
+        old_tm = self.factory.makeSuggestion(
+            pofile=pofile, date_created=yesterday)
+        new_tm = self.factory.makeSuggestion(
+            translator=old_tm.submitter, pofile=pofile, date_created=today)
+        self.assertNotEqual(old_tm.date_created, new_tm.date_created)
+        self.assertItemsEqual(
+            [new_tm.date_created],
+            self.make_script().get_contributions(pofile).values())
+
+    def test_get_contributions_ignores_inactive_potmsgsets(self):
+        pofile = self.factory.makePOFile()
+        potmsgset = self.factory.makePOTMsgSet(
+            potemplate=pofile.potemplate, sequence=0)
+        self.factory.makeSuggestion(pofile=pofile, potmsgset=potmsgset)
+        self.assertEqual({}, self.make_script().get_contributions(pofile))
+
+    def test_get_contributions_includes_diverged_messages_for_template(self):
+        pofile = self.factory.makePOFile()
+        tm = self.factory.makeSuggestion(pofile=pofile)
+        tm.potemplate = pofile.potemplate
+        self.assertItemsEqual(
+            [tm.submitter.id],
+            self.make_script().get_contributions(pofile).keys())
+
+    def test_get_contributions_excludes_other_diverged_messages(self):
+        pofile = self.factory.makePOFile()
+        tm = self.factory.makeSuggestion(pofile=pofile)
+        tm.potemplate = self.factory.makePOTemplate()
+        self.assertEqual({}, self.make_script().get_contributions(pofile))
+
+    def test_get_pofiletranslators_gets_pofiletranslators_for_pofile(self):
+        pofile = self.factory.makePOFile()
+        tm = self.make_message_with_pofiletranslator(pofile)
+        pofts = self.make_script().get_pofiletranslators(pofile)
+        self.assertItemsEqual([tm.submitter.id], pofts.keys())
+        poft = pofts[tm.submitter.id]
+        self.assertEqual(pofile, poft.pofile)
+
+    def test_scrub_pofile_leaves_good_pofiletranslator_in_place(self):
+        pofile = self.factory.makePOFile()
+        tm = self.make_message_with_pofiletranslator(pofile)
+        old_poft = self.query_pofiletranslator(pofile, tm.submitter).one()
+
+        self.make_script().scrub_pofile(pofile)
+
+        new_poft = self.query_pofiletranslator(pofile, tm.submitter).one()
+        self.assertEqual(old_poft, new_poft)
+
+    def test_scrub_pofile_deletes_unwarranted_entries(self):
+        poft = self.make_pofiletranslator_without_message()
+        pofile = poft.pofile
+        person = poft.person
+        self.becomeDbUser('scrub_pofiletranslator')
+        self.make_script().scrub_pofile(poft.pofile)
+        self.becomeDbUser('launchpad')
+        self.assertIsNone(self.query_pofiletranslator(pofile, person).one())
+
+    def test_scrub_pofile_adds_missing_entries(self):
+        pofile = self.factory.makePOFile()
+        tm = self.make_message_without_pofiletranslator(pofile)
+
+        self.make_script().scrub_pofile(pofile)
+
+        new_poft = self.query_pofiletranslator(pofile, tm.submitter).one()
+        self.assertEqual(tm.submitter, new_poft.person)
+        self.assertEqual(pofile, new_poft.pofile)
+
+    def test_script(self):
+        pofile = self.factory.makePOFile()
+        tm = self.make_message_without_pofiletranslator(pofile)
+        bad_poft = self.make_pofiletranslator_without_message(pofile)
+        noncontributor = bad_poft.person
+        transaction.commit()
+
+        retval, stdout, stderr = run_script(
+            'cronscripts/scrub-pofiletranslator.py', [])
+
+        self.assertIsNotNone(
+            self.query_pofiletranslator(pofile, tm.submitter).one())
+        self.assertIsNone(
+            self.query_pofiletranslator(pofile, noncontributor).one())


Follow ups