← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~sinzui/launchpad/tranlationmessage-missing-potmsgset into lp:launchpad

 

Curtis Hovey has proposed merging lp:~sinzui/launchpad/tranlationmessage-missing-potmsgset into lp:launchpad.

Commit message:
Avoid putting UnusedPOTMsgSetPruner in a cage match against other processes.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #824229 in Launchpad itself: "garbo-daily UnusedPOTMsgSetPruner failing with violates foreign key constraint"
  https://bugs.launchpad.net/launchpad/+bug/824229

For more details, see:
https://code.launchpad.net/~sinzui/launchpad/tranlationmessage-missing-potmsgset/+merge/133532

garbo-daily UnusedPOTMsgSetPruner contends with other process that
create or update POTMsgSets and TranslationMessages, such as the
rosetta-poimport script and user activity. When UnusedPOTMsgSetPruner
starts, it builds a cache of Ids of empty POTMsgSets. It then iterates
over them in batches to get the TranslationMessages id, removes them,
then removes the POTMsgSets. Each batch is one transaction. Any process
that adds or modifies a POTMsgSets or TranslationMessages between loops
can contradict the list of unused POTMsgSets.

--------------------------------------------------------------------

RULES

    Pre-implementation: jtv, stevenk
    * UnusedPOTMsgSetPruner needs additional criteria to be certain that
      the POTMsgSet is not used, *and* will not be used.
      * Each batch needs to recheck the list of ids to filter of those
        that became "used" during the loop.
    * UnusedPOTMsgSetPruner's list of POTMsgSets ids are not distinct
      because both queries and their union can return 50x the number of
      unique ids.
      * This is mostly lots of extra db work.

QA

    * Ask a webops to run ./cronscripts/garbo-daily.py for qastaging.
    * Verify the log has no errors.


LINT

    lib/lp/scripts/garbo.py
    lib/lp/scripts/tests/test_garbo.py


LoC

    I have a 4000 line credit this week.


TEST

    ./bin/test -vv -t UnusedPOTMsgSetPruner lp.scripts.tests.test_garbo


IMPLEMENTATION

I first changed .msgset_ids_to_remove to call a private method that
does the same query. I then changed the ._get_msgset_ids_to_remove() to
accept a list of ids to ensure newly used POTMsgSets were filtered out.
Then I discovered errors in the number of loops...the list of ids was
never unique and my change extended the number of loops. I added a set()
to dedupe the ids -- the existing tests got faster since they thought
they were looping once.
    lib/lp/scripts/garbo.py
    lib/lp/scripts/tests/test_garbo.py
-- 
https://code.launchpad.net/~sinzui/launchpad/tranlationmessage-missing-potmsgset/+merge/133532
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~sinzui/launchpad/tranlationmessage-missing-potmsgset into lp:launchpad.
=== modified file 'lib/lp/scripts/garbo.py'
--- lib/lp/scripts/garbo.py	2012-11-05 11:32:27 +0000
+++ lib/lp/scripts/garbo.py	2012-11-08 17:51:26 +0000
@@ -1224,14 +1224,32 @@
 
     @cachedproperty
     def msgset_ids_to_remove(self):
-        """Return the IDs of the POTMsgSets to remove."""
+        """The IDs of the POTMsgSets to remove."""
+        return self._get_msgset_ids_to_remove()
+
+    def _get_msgset_ids_to_remove(self, ids=None):
+        """Return a distrinct list IDs of the POTMsgSets to remove.
+
+        :param ids: a list of POTMsgSet ids to filter. If ids is None,
+            all unused POTMsgSet in the database are returned.
+        """
+        if ids is None:
+            constraints = dict(
+                tti_constraint="AND TRUE",
+                potmsgset_constraint="AND TRUE")
+        else:
+            ids_in = ', '.join([str(id) for id in ids])
+            constraints = dict(
+                tti_constraint="AND tti.potmsgset IN (%s)" % ids_in,
+                potmsgset_constraint="AND POTMsgSet.id IN (%s)" % ids_in)
         query = """
             -- Get all POTMsgSet IDs which are obsolete (sequence == 0)
             -- and are not used (sequence != 0) in any other template.
             SELECT POTMsgSet
               FROM TranslationTemplateItem tti
-              WHERE sequence=0 AND
-              NOT EXISTS(
+              WHERE sequence=0
+              %(tti_constraint)s
+              AND NOT EXISTS(
                 SELECT id
                   FROM TranslationTemplateItem
                   WHERE potmsgset = tti.potmsgset AND sequence != 0)
@@ -1243,20 +1261,22 @@
                LEFT OUTER JOIN TranslationTemplateItem
                  ON TranslationTemplateItem.potmsgset = POTMsgSet.id
                WHERE
-                 TranslationTemplateItem.potmsgset IS NULL);
-            """
+                 TranslationTemplateItem.potmsgset IS NULL
+                 %(potmsgset_constraint)s);
+            """ % constraints
         store = IMasterStore(POTMsgSet)
         results = store.execute(query)
-        ids_to_remove = [id for (id,) in results.get_all()]
-        return ids_to_remove
+        ids_to_remove = set([id for (id,) in results.get_all()])
+        return list(ids_to_remove)
 
     def __call__(self, chunk_size):
         """See `TunableLoop`."""
         # We cast chunk_size to an int to avoid issues with slicing
         # (DBLoopTuner passes in a float).
         chunk_size = int(chunk_size)
-        msgset_ids_to_remove = (
+        msgset_ids = (
             self.msgset_ids_to_remove[self.offset:][:chunk_size])
+        msgset_ids_to_remove = self._get_msgset_ids_to_remove(msgset_ids)
         # Remove related TranslationTemplateItems.
         store = IMasterStore(POTMsgSet)
         related_ttis = store.find(

=== modified file 'lib/lp/scripts/tests/test_garbo.py'
--- lib/lp/scripts/tests/test_garbo.py	2012-11-07 12:38:19 +0000
+++ lib/lp/scripts/tests/test_garbo.py	2012-11-08 17:51:26 +0000
@@ -73,7 +73,11 @@
     load_garbo_job_state,
     LoginTokenPruner,
     OpenIDConsumerAssociationPruner,
+<<<<<<< TREE
     save_garbo_job_state,
+=======
+    UnusedPOTMsgSetPruner,
+>>>>>>> MERGE-SOURCE
     UnusedSessionPruner,
     )
 from lp.services.config import config
@@ -127,6 +131,7 @@
     LaunchpadZopelessLayer,
     ZopelessDatabaseLayer,
     )
+from lp.translations.model.pofile import POFile
 from lp.translations.model.potmsgset import POTMsgSet
 from lp.translations.model.translationtemplateitem import (
     TranslationTemplateItem,
@@ -1034,6 +1039,44 @@
         self.runDaily()
         self.assertEqual(0, obsolete_msgsets.count())
 
+    def test_UnusedPOTMsgSetPruner_preserves_used_potmsgsets(self):
+        # UnusedPOTMsgSetPruner will not remove a potmsgset if it changes
+        # between calls.
+        switch_dbuser('testadmin')
+        potmsgset_pofile = {}
+        for n in xrange(4):
+            pofile = self.factory.makePOFile()
+            translation_message = self.factory.makeCurrentTranslationMessage(
+                pofile=pofile)
+            translation_message.potmsgset.setSequence(
+                pofile.potemplate, 0)
+            potmsgset_pofile[translation_message.potmsgset.id] = pofile.id
+        transaction.commit()
+        store = IMasterStore(POTMsgSet)
+        test_ids = potmsgset_pofile.keys()
+        obsolete_msgsets = store.find(
+            POTMsgSet,
+            In(TranslationTemplateItem.potmsgsetID, test_ids),
+            TranslationTemplateItem.sequence == 0)
+        self.assertEqual(4, obsolete_msgsets.count())
+        pruner = UnusedPOTMsgSetPruner(self.log)
+        pruner(2)
+        # A potmsgeset is set to a sequence > 0 between batches/commits.
+        last_id = pruner.msgset_ids_to_remove[-1]
+        used_potmsgset = store.find(POTMsgSet, POTMsgSet.id == last_id).one()
+        used_pofile = store.find(
+            POFile, POFile.id == potmsgset_pofile[last_id]).one()
+        translation_message = self.factory.makeCurrentTranslationMessage(
+            pofile=used_pofile, potmsgset=used_potmsgset)
+        used_potmsgset.setSequence(used_pofile.potemplate, 1)
+        transaction.commit()
+        # Next batch.
+        pruner(2)
+        self.assertEqual(0, obsolete_msgsets.count())
+        preserved_msgsets = store.find(
+            POTMsgSet, In(TranslationTemplateItem.potmsgsetID, test_ids))
+        self.assertEqual(1, preserved_msgsets.count())
+
     def test_UnusedPOTMsgSetPruner_removes_unreferenced_message_sets(self):
         # If a POTMsgSet is not referenced by any templates the
         # UnusedPOTMsgSetPruner will remove it.


Follow ups