← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~jtv/launchpad/bug-891929-tolerate-surprise-pofiles into lp:launchpad

 

Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/bug-891929-tolerate-surprise-pofiles into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #891929 in Launchpad itself: "Translations-copying script fails"
  https://bugs.launchpad.net/launchpad/+bug/891929

For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-891929-tolerate-surprise-pofiles/+merge/104755

The translations-copying script that we need to run in order to open every new Ubuntu release's translations has been breaking.  What seems to happen is that after the script has started copying POTemplates into the new series, but before the POFiles have been fully copied, some of the translations-sharing code thinks it needs to create missing POFiles for us.  The copying code then tries to pour its own copy of the previous series' POFile into the same POTemplate/Language slot and boom: unique violation in the database.

I outlined a plan for fixing this on the internal mailing list.  This script implements two of the steps: stop trying to copy POFileTranslator records, and with the complication of its foreign key to POFile out of the way, make the copying script tolerate POFiles that are already in place.

By now you're thinking: yes but who's going to create those POFileTranslator records?  We're going to have to make that asynchronous.  A cron job can update them as needed, without every single script having to worry about its effects on POFileTranslator.  Other branches I have in progress also introduce lots more slack in this process.  POFileTranslator was never a very exact table, and with message sharing it's grown out of its clothes to the point where attempts to keep it immediately accurate is the main source of its problems.  If we lose a POFileTranslator record now, and we have done so tons of times, we're not likely ever to recover it.  Once we have an effective scrubbing script, we'll be sure to recover it eventually.

Anyway, back to the branch.  Reproducing the race condition was nasty work.  I had to inject an exception many layers deep into the code to get that done.  The exception carries information about the intercepted function call back out to the test, which then sets up its race condition and runs the real call.  It's not pretty, and it's not easy to follow, but it should be at least somewhat robust against future changes.


Jeroen
-- 
https://code.launchpad.net/~jtv/launchpad/bug-891929-tolerate-surprise-pofiles/+merge/104755
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/launchpad/bug-891929-tolerate-surprise-pofiles into lp:launchpad.
=== modified file 'lib/lp/translations/model/distroseries_translations_copy.py'
--- lib/lp/translations/model/distroseries_translations_copy.py	2011-12-30 06:14:56 +0000
+++ lib/lp/translations/model/distroseries_translations_copy.py	2012-05-04 14:58:23 +0000
@@ -1,11 +1,13 @@
-# Copyright 2009-2010 Canonical Ltd.  This software is licensed under the
+# Copyright 2009-2012 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 """Functions to copy translations from previous to child distroseries."""
 
 __metaclass__ = type
 
-__all__ = [ 'copy_active_translations' ]
+__all__ = [
+    'copy_active_translations',
+    ]
 
 from lp.services.database.multitablecopy import MultiTableCopy
 from lp.services.database.sqlbase import (
@@ -14,6 +16,37 @@
     )
 
 
+def omit_redundant_pofiles(from_table, to_table, batch_size, begin_id,
+                           end_id):
+    """Batch-pouring callback: skip POFiles that have become redundant.
+
+    This is needed to deal with a concurrency problem where POFiles may
+    get created (through message sharing) while translations are still
+    being copied.
+    """
+    assert to_table.lower() == "POFile".lower(), (
+        "This callback is meant for pouring the POFile table only.")
+
+    params = {
+        'from_table': from_table,
+        'begin_id': begin_id,
+        'end_id': end_id,
+    }
+    cursor().execute("""
+        DELETE FROM %(from_table)s
+        WHERE
+            id >= %(begin_id)s AND
+            id < %(end_id)s AND
+            EXISTS (
+                SELECT *
+                FROM POFile
+                WHERE
+                    POFile.potemplate = %(from_table)s.potemplate AND
+                    POFile.language = %(from_table)s.language
+            )
+        """ % params)
+
+
 def copy_active_translations(child, transaction, logger):
     """Furnish untranslated child `DistroSeries` with previous series's
     translations.
@@ -41,9 +74,7 @@
         # translations.
         return
 
-    translation_tables = [
-        'potemplate', 'translationtemplateitem', 'pofile', 'pofiletranslator'
-        ]
+    translation_tables = ['potemplate', 'translationtemplateitem', 'pofile']
 
     full_name = "%s_%s" % (child.distribution.name, child.name)
     copier = MultiTableCopy(full_name, translation_tables, logger=logger)
@@ -56,12 +87,6 @@
         "Populating blank distroseries %s with translations from %s." %
         (child.name, previous_series.name))
 
-    # Because this function only deals with the case where "child" is a new
-    # distroseries without any existing translations attached, it can afford
-    # to be much more cavalier with ACID considerations than the function that
-    # updates an existing translation based on what's found in the previous
-    # series.
-
     # 1. Extraction phase--for every table involved (called a "source table"
     # in MultiTableCopy parlance), we create a "holding table."  We fill that
     # with all rows from the source table that we want to copy from the
@@ -91,9 +116,9 @@
     # Now that we have the data "in private," where nobody else can see it,
     # we're free to play with it.  No risk of locking other processes out of
     # the database.
-    # Change series identifiers in the holding table to point to the child
-    # (right now they all bear the previous series's id) and set creation
-    # dates to the current transaction time.
+    # Change series identifiers in the holding table to point to the new
+    # series (right now they all bear the previous series's id) and set
+    # creation dates to the current transaction time.
     cursor().execute('''
         UPDATE %s
         SET
@@ -103,18 +128,15 @@
                     ('now'::text)::timestamp(6) with time zone)
     ''' % (copier.getHoldingTableName('potemplate'), quote(child)))
 
-
     # Copy each TranslationTemplateItem whose template we copied, and let
     # MultiTableCopy replace each potemplate reference with a reference to
     # our copy of the original POTMsgSet's potemplate.
     copier.extract('translationtemplateitem', ['potemplate'], 'sequence > 0')
 
     # Copy POFiles, making them refer to the child's copied POTemplates.
-    copier.extract('pofile', ['potemplate'])
-
-    # Copy POFileTranslators, making them refer to the child's copied POFile.
-    copier.extract('pofiletranslator', ['pofile'])
+    copier.extract(
+        'pofile', ['potemplate'],
+        batch_pouring_callback=omit_redundant_pofiles)
 
     # Finally, pour the holding tables back into the originals.
     copier.pour(transaction)
-

=== added file 'lib/lp/translations/tests/test_distroseries_translations_copy.py'
--- lib/lp/translations/tests/test_distroseries_translations_copy.py	1970-01-01 00:00:00 +0000
+++ lib/lp/translations/tests/test_distroseries_translations_copy.py	2012-05-04 14:58:23 +0000
@@ -0,0 +1,84 @@
+# Copyright 2012 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Unit tests for distroseries translations initialization."""
+
+__metaclass__ = type
+
+from lp.services.database.multitablecopy import MultiTableCopy
+from lp.testing import TestCaseWithFactory
+from lp.testing.faketransaction import FakeTransaction
+from lp.testing.layers import ZopelessDatabaseLayer
+from lp.services.log.logger import DevNullLogger
+from lp.translations.model.distroseries_translations_copy import (
+    copy_active_translations,
+    )
+
+
+class EarlyExit(Exception):
+    """Exception used to force early exit from the copying code."""
+    def __init__(self, *args, **kwargs):
+        self.args = args
+        self.kwargs = kwargs
+
+
+def force_exit(*args, **kwargs):
+    """Raise `EarlyExit`."""
+    raise EarlyExit(*args, **kwargs)
+
+
+class TestDistroSeriesTranslationsCopying(TestCaseWithFactory):
+
+    layer = ZopelessDatabaseLayer
+
+    def test_does_not_overwrite_existing_pofile(self):
+        # Sometimes a POFile we're about to copy to a new distroseries
+        # has already been created there due to message sharing.  In
+        # that case, the copying code leaves the existing POFile in
+        # place and does not copy it.  (Nor does it raise an error.)
+        existing_series = self.factory.makeDistroSeries(name='existing')
+        new_series = self.factory.makeDistroSeries(
+            name='new', distribution=existing_series.distribution,
+            previous_series=existing_series)
+        template = self.factory.makePOTemplate(distroseries=existing_series)
+        pofile = self.factory.makePOFile(potemplate=template)
+        self.factory.makeCurrentTranslationMessage(
+            language=pofile.language, potmsgset=self.factory.makePOTMsgSet(
+                potemplate=template))
+
+        # Sabotage the pouring code so that when it's about to hit the
+        # POFile table, it returns to us and we can simulate a race
+        # condition.
+        pour_table = MultiTableCopy._pourTable
+
+        def pour_or_stop_at_pofile(self, holding_table, table, *args,
+                                   **kwargs):
+            args = (self, holding_table, table) + args
+            if table.lower() == 'pofile':
+                raise EarlyExit(*args, **kwargs)
+            else:
+                return pour_table(*args, **kwargs)
+
+        MultiTableCopy._pourTable = pour_or_stop_at_pofile
+        try:
+            copy_active_translations(
+                new_series, FakeTransaction(), DevNullLogger())
+        except EarlyExit as e:
+            pour_args = e.args
+            pour_kwargs = e.kwargs
+        finally:
+            MultiTableCopy._pourTable = pour_table
+
+        # Simulate another POFile being created for new_series while the
+        # copier was working.
+        new_template = new_series.getTranslationTemplateByName(template.name)
+        new_pofile = self.factory.makePOFile(
+            potemplate=new_template, language=pofile.language)
+
+        # Now continue pouring the POFile table.
+        pour_table(*pour_args, **pour_kwargs)
+
+        # The POFile we just created in our race condition stays in
+        # place.  There is no error.
+        resulting_pofile = new_template.getPOFileByLang(pofile.language.code)
+        self.assertEqual(new_pofile, resulting_pofile)


Follow ups