← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~benji/launchpad/bug-742662 into lp:launchpad

 

Benji York has proposed merging lp:~benji/launchpad/bug-742662 into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #742662 in Launchpad itself: "Mixed new line markers causing OOPS while importing translations"
  https://bugs.launchpad.net/launchpad/+bug/742662

For more details, see:
https://code.launchpad.net/~benji/launchpad/bug-742662/+merge/74254

Bug 742662 describes a scenario in which an imported translation
contains mixed newlines (\r as well as \n) and outlines three possible
fixes.  This branch implements the option of disallowing such imports.
The software in place intended to do so, but the code to generate the
exception denoting a mixed newline condition was not being called.  This
branch adds that code and tests for it.

The Sanitize class had a method that both normalized newlines as well as
reporting mixed newline violations.  It didn't make sense to separate
out those two bits of functionality so I added a facade
(verifyNewlineConsistency) that only claims to validate newlines
(discarding the normalized result) and used that.  I then added an
intermediary function (verify_message_newline_consistency) that applies
the newline verification to every string of natural-language text in a
translation message and then called that function when importing a
message (in importFile in
lib/lp/translations/utilities/translation_import.py).

Tests: the new tests that were added (and several related tests) can be
run with this command:

    bin/test -c -t test_sanitize -t test_file_importer \
        -t test_translation_importer

Lint: the "make lint" report is clean (after fixing some pre-existing
lint).

QA: I honestly don't know how to QA this.  I'll have to ask Danilo or
one of the other translations guys.  It might be that I can export the
template mentioned in the bug report
(https://translations.launchpad.net/ubuntu/natty/+source/alsa-utils/+pots/alsa-utils/bs/+translate)
and reimport it.  It should generate an error message about mixed
newlines on reimport.

-- 
https://code.launchpad.net/~benji/launchpad/bug-742662/+merge/74254
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~benji/launchpad/bug-742662 into lp:launchpad.
=== modified file 'lib/lp/translations/utilities/sanitize.py'
--- lib/lp/translations/utilities/sanitize.py	2010-09-29 15:46:26 +0000
+++ lib/lp/translations/utilities/sanitize.py	2011-09-06 15:38:41 +0000
@@ -4,6 +4,7 @@
 __metaclass__ = type
 __all__ = [
     'MixedNewlineMarkersError',
+    'Sanitizer',
     'sanitize_translations_from_webui',
     'sanitize_translations_from_import',
     ]
@@ -47,29 +48,44 @@
         self.newline_style = self._getNewlineStyle(
             english_singular, 'Original')
 
-    def _getNewlineStyle(self, text, text_name):
+    @classmethod
+    def _getNewlineStyle(cls, text, text_name=None):
         """Find out which newline style is used in text."""
-        error_message = (
-            "%s text (%r) mixes different newline markers." % (
-                text_name, text))
         style = None
         # To avoid confusing the single-character newline styles for mac and
         # unix with the two-character windows one, remove the windows-style
         # newlines from the text and use that text to search for the other
         # two.
-        stripped_text = text.replace(self.windows_style, u'')
+        stripped_text = text.replace(cls.windows_style, u'')
         if text != stripped_text:
             # Text contains windows style new lines.
-            style = self.windows_style
+            style = cls.windows_style
 
-        for one_char_style in (self.mac_style, self.unix_style):
+        for one_char_style in (cls.mac_style, cls.unix_style):
             if one_char_style in stripped_text:
                 if style is not None:
+                    # If the text has a descriptive name (like "Original"),
+                    # then format it for inclusion in the error message.
+                    if text_name is not None:
+                        text_name += ' '
+                    else:
+                        text_name = ''
+
+                    error_message = (
+                        "%stext (%r) mixes different newline markers." % (
+                            text_name, text))
+
                     raise MixedNewlineMarkersError(error_message)
                 style = one_char_style
 
         return style
 
+    @classmethod
+    def verifyNewlineConsistency(cls, text, text_name=None):
+        """Raise a MixedNewlineMarkersError if the newlines are inconsistent.
+        """
+        cls._getNewlineStyle(text, text_name)
+
     def sanitize(self, translation_text):
         """Return 'translation_text' or None after doing some sanitization.
 

=== modified file 'lib/lp/translations/utilities/tests/test_file_importer.py'
--- lib/lp/translations/utilities/tests/test_file_importer.py	2011-06-09 10:50:25 +0000
+++ lib/lp/translations/utilities/tests/test_file_importer.py	2011-09-06 15:38:41 +0000
@@ -31,6 +31,7 @@
     ITranslationImportQueue,
     )
 from lp.translations.utilities.gettext_po_importer import GettextPOImporter
+from lp.translations.utilities.sanitize import MixedNewlineMarkersError
 from lp.translations.utilities.translation_common_format import (
     TranslationMessageData,
     )
@@ -246,7 +247,7 @@
         message.addTranslation(0, u'')
 
         potmsgset = self.factory.makePOTMsgSet(
-            potemplate = importer.potemplate, sequence=50)
+            potemplate=importer.potemplate, sequence=50)
         translation = importer.storeTranslationsInDatabase(
             message, potmsgset)
         # No TranslationMessage is created.
@@ -353,6 +354,24 @@
                 po_importer.potemplate.displayname),
             'Did not create the correct comment for %s' % test_email)
 
+    def test_FileImporter_rejects_mixed_newline_styles(self):
+        # Files with mixed newlines (Mac, Unix, Windows) will generate an
+        # error.  See the Sanitizer tests for comprehensive tests (all
+        # combinations, etc.).
+        pofile_contents = dedent(r"""
+            msgid ""
+            msgstr ""
+            "POT-Creation-Date: 2001-04-07 16:30+0500\n"
+            "Content-Type: text/plain; charset=CHARSET\n"
+            "Language: English\n"
+
+            msgid "bar"
+            msgstr "mixed\rnewlines\n"
+            """)
+
+        pot_importer = self._createPOTFileImporter(pofile_contents, True)
+        self.assertRaises(MixedNewlineMarkersError, pot_importer.importFile)
+
     def test_getPersonByEmail_personless_account(self):
         # An Account without a Person attached is a difficult case for
         # _getPersonByEmail: it has to create the Person but re-use an
@@ -566,7 +585,7 @@
     def test_not_raises_OutdatedTranslationError_on_upstream_uploads(self):
         queue_entry = self._make_queue_entry(True)
         try:
-            importer = POFileImporter(queue_entry, GettextPOImporter(), None)
+            POFileImporter(queue_entry, GettextPOImporter(), None)
         except OutdatedTranslationError:
             self.fail("OutdatedTranslationError raised.")
 
@@ -574,7 +593,7 @@
         queue_entry = self._make_queue_entry(True)
         pofile = queue_entry.pofile
         old_raw_header = pofile.header
-        importer = POFileImporter(queue_entry, GettextPOImporter(), None)
+        POFileImporter(queue_entry, GettextPOImporter(), None)
         self.assertEqual(old_raw_header, pofile.header)
 
 
@@ -762,4 +781,3 @@
         importer = POFileImporter(
             entry, importers[TranslationFileFormat.PO], None)
         self.assertTrue(importer.is_upstream_import_on_sourcepackage)
-

=== modified file 'lib/lp/translations/utilities/tests/test_sanitize.py'
--- lib/lp/translations/utilities/tests/test_sanitize.py	2011-08-12 11:37:08 +0000
+++ lib/lp/translations/utilities/tests/test_sanitize.py	2011-09-06 15:38:41 +0000
@@ -158,6 +158,27 @@
                     MixedNewlineMarkersError,
                     sanitizer.normalizeNewlines, translation_text)
 
+    def test_verifyNewlineConsistency_with_consistent_newlines(self):
+        # Consistent newlines in the text will not raise an exception.
+        translation_template = u"Text with %s consistent %s newlines."
+        for translation_newline in self.newline_styles:
+            translation_text = translation_template % (
+                translation_newline, translation_newline)
+            Sanitizer.verifyNewlineConsistency(translation_text)
+
+    def test_verifyNewlineConsistency_with_mixed_newlines(self):
+        # Consistent newlines in the text will not raise an exception.
+        translation_template = u"Text with %s mixed %s newlines."
+        for translation_newline_1 in self.newline_styles:
+            other_newlines = self.newline_styles[:]
+            other_newlines.remove(translation_newline_1)
+            for translation_newline_2 in other_newlines:
+                translation_text = translation_template % (
+                    translation_newline_1, translation_newline_2)
+                self.assertRaises(
+                    MixedNewlineMarkersError,
+                    Sanitizer.verifyNewlineConsistency, translation_text)
+
     def test_sanitize(self):
         # Calling the Sanitizer object will apply all sanitization procedures.
         sanitizer = Sanitizer(u"Text with\nnewline.")

=== modified file 'lib/lp/translations/utilities/tests/test_translation_importer.py'
--- lib/lp/translations/utilities/tests/test_translation_importer.py	2011-02-14 17:14:17 +0000
+++ lib/lp/translations/utilities/tests/test_translation_importer.py	2011-09-06 15:38:41 +0000
@@ -9,7 +9,10 @@
 
 from canonical.testing.layers import LaunchpadZopelessLayer
 from lp.services.log.logger import DevNullLogger
-from lp.testing import TestCaseWithFactory
+from lp.testing import (
+    TestCase,
+    TestCaseWithFactory,
+    )
 from lp.testing.matchers import Provides
 from lp.translations.enums import RosettaImportStatus
 from lp.translations.interfaces.translationfileformat import (
@@ -21,12 +24,15 @@
 from lp.translations.utilities.translation_common_format import (
     TranslationMessageData,
     )
+from lp.translations.utilities.sanitize import MixedNewlineMarkersError
 from lp.translations.utilities.translation_import import (
     importers,
     is_identical_translation,
+    message_text_attributes,
     POFileImporter,
     POTFileImporter,
     TranslationImporter,
+    verify_message_newline_consistency,
     )
 
 
@@ -301,3 +307,59 @@
         importer = POFileImporter(queue_entry, FakeParser(), DevNullLogger())
         importer.importMessage(message)
         self.assertEqual(old_file_references, potmsgset.filereferences)
+
+
+class FakeMessage:
+    msgid_singular = None
+    msgid_plural = None
+    singular_text = None
+    plural_text = None
+    translations = ()
+
+
+class NewlineVerificationTests(TestCase):
+    def test_attribute_ok(self):
+        # If a message's attributes are None or strings without mixed
+        # newlines, then no exception is raise.
+        for attr_name in message_text_attributes:
+            message = FakeMessage()
+            message.msgid_singular = 'no mixed newlines here'
+            # The next call doesn't raise an exception.
+            verify_message_newline_consistency(message)
+
+    def test_attribute_not_ok(self):
+        # If a message's attributes contain mixed newlines, an exception is
+        # raised.
+        for attr_name in message_text_attributes:
+            message = FakeMessage()
+            setattr(message, attr_name, 'mixed\rnewlines\n')
+            e = self.assertRaises(
+                MixedNewlineMarkersError,
+                verify_message_newline_consistency, message)
+            # The error message mentions the attribute name.
+            self.assertEqual(
+                e.args[0],
+                "%s text ('mixed\\rnewlines\\n') " % attr_name +
+                "mixes different newline markers.")
+
+    def test_translations_ok(self):
+        # If a message's translations are None or strings without mixed
+        # newlines, then no exception is raise.
+        message = FakeMessage()
+        message.translations = ['no mixed newlines here']
+        # The next call doesn't raise an exception.
+        verify_message_newline_consistency(message)
+
+    def test_translations_not_ok(self):
+        # If a message's translations contain mixed newlines, an exception is
+        # raised.
+        message = FakeMessage()
+        message.translations = ['mixed\rnewlines\n']
+        e = self.assertRaises(
+            MixedNewlineMarkersError,
+            verify_message_newline_consistency, message)
+        # The error message mentions the translation index ("0" in this case).
+        self.assertEqual(
+            e.args[0],
+            "translation 0 text ('mixed\\rnewlines\\n') "
+            "mixes different newline markers.")

=== modified file 'lib/lp/translations/utilities/translation_import.py'
--- lib/lp/translations/utilities/translation_import.py	2011-03-24 11:21:19 +0000
+++ lib/lp/translations/utilities/translation_import.py	2011-09-06 15:38:41 +0000
@@ -59,6 +59,7 @@
 from lp.translations.utilities.mozilla_xpi_importer import MozillaXpiImporter
 from lp.translations.utilities.sanitize import (
     sanitize_translations_from_import,
+    Sanitizer,
     )
 from lp.translations.utilities.translation_common_format import (
     TranslationMessageData,
@@ -75,6 +76,19 @@
     TranslationFileFormat.XPI: MozillaXpiImporter(),
     }
 
+message_text_attributes = [
+    'msgid_singular', 'msgid_plural', 'singular_text', 'plural_text']
+
+
+def verify_message_newline_consistency(message):
+    for attr_name in message_text_attributes:
+        value = getattr(message, attr_name)
+        if value is not None:
+            Sanitizer.verifyNewlineConsistency(value, attr_name)
+    for index, translation in enumerate(message.translations):
+        Sanitizer.verifyNewlineConsistency(
+            translation, 'translation %d' % index)
+
 
 def is_identical_translation(existing_msg, new_msg):
     """Is a new translation substantially the same as the existing one?
@@ -373,7 +387,7 @@
     """
 
     def __init__(self, translation_import_queue_entry,
-                 importer, logger = None):
+                 importer, logger=None):
         """Base constructor to set up common attributes and parse the imported
         file into a member variable (self.translation_file).
 
@@ -578,6 +592,7 @@
 
         for message in self.translation_file.messages:
             if message.msgid_singular:
+                verify_message_newline_consistency(message)
                 self.importMessage(message)
 
         return self.errors, self.translation_file.syntax_warnings
@@ -680,7 +695,7 @@
             message._translations = None
 
         if len(message.flags) > 0:
-            flags_comment = u", "+u", ".join(message.flags)
+            flags_comment = u", " + u", ".join(message.flags)
         else:
             flags_comment = u""