← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~jtv/launchpad/bug-722568 into lp:launchpad

 

Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/bug-722568 into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  #722568 Empty TranslationMessages confuse stats
  https://bugs.launchpad.net/bugs/722568

For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-722568/+merge/50593

= Summary =

Translation statistics are off.  That's because a translatable message can have an empty translation, which is different in database terms from it having no translation at all but still means it should count as untranslated.

The query that collects translation statistics for POFiles included messages with empty translations, although they were correctly noted as untranslated on the "other side" (the two "sides" of translation being Ubuntu and upstream).  In the query that counts and classifies translated messages, this gave rise to the nonsensical category of translated messages that were not translated on the other side, yet were translated identically between the two sides.


== Proposed fix ==

Exclude empty translations from the query that counts translated messages.


== Pre-implementation notes ==

Chatted this over with the rest of the former Translations team.  At first it looked as if the problem was in the ancient, highly confused, and badly documented field of statistics interpretation.  This has always been messy, but also underwent deep changes with the introduction of message sharing, and later, Ubuntu/upstream message sharing (the Recife model).


== Implementation details ==

The statistics code already generated a string to check for empty TranslationMessages, but on the "other side" only.  All I had to do was extract it into a function, re-use it for the message on "this side," and stick the result in the WHERE clause as an extra condition.

I also updated the rosettastats docstrings, since they were vastly out of date.


== Tests ==

The tests on statistics are many and varied, making it a shame that this problem wasn't caught before.  You can exercise them with:
{{{
./bin/test -vvc lp.translations.tests.test_pofile
}}}

Of the new tests I added there, one was designed to—and did—pass without the fix: test_empty_messages_on_other_side_count_as_untranslated.  That's because of the existing clause that I re-used in this branch.


== Demo and Q/A ==

Run cronscripts/rosetta-pofile-stats.py.  This will take a long time.  Then verify POFile statistics by comparing the messages returned by the Translated and Untranslated translation filters to the numbers reported in the statistics.

Notably, check the zh_CN translation for WiserEarth trunk: https://translations.launchpad.net/wiserearth/trunk


= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/translations/tests/test_pofile.py
  lib/lp/translations/model/pofile.py
-- 
https://code.launchpad.net/~jtv/launchpad/bug-722568/+merge/50593
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/launchpad/bug-722568 into lp:launchpad.
=== modified file 'lib/lp/translations/model/pofile.py'
--- lib/lp/translations/model/pofile.py	2011-01-26 23:33:20 +0000
+++ lib/lp/translations/model/pofile.py	2011-02-21 13:46:28 +0000
@@ -1,4 +1,4 @@
-# Copyright 2009-2010 Canonical Ltd.  This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 # pylint: disable-msg=E0611,W0212,W0231
@@ -116,6 +116,17 @@
     )
 
 
+def compose_sql_translationmessage_has_translations(tm_sql_identifier):
+    """Compose SQL for "`TranslationMessage` is nonempty.".
+
+    :param tm_sql_identifier: The SQL identifier for the
+        `TranslationMessage` in the query that's to be tested.
+    """
+    return "COALESCE(%s) IS NOT NULL" % ", ".join([
+        "%s.msgstr%d" % (tm_sql_identifier, form)
+        for form in xrange(TranslationConstants.MAX_PLURAL_FORMS)])
+
+
 class POFileMixIn(RosettaStats):
     """Base class for `POFile` and `DummyPOFile`.
 
@@ -827,15 +838,15 @@
         """Count `currentcount`, `updatescount`, and `rosettacount`."""
         side_traits = getUtility(ITranslationSideTraitsSet).getForTemplate(
             self.potemplate)
-        has_other_msgstrs = "COALESCE(%s) IS NOT NULL" % ", ".join([
-            "Other.msgstr%d" % form
-            for form in xrange(TranslationConstants.MAX_PLURAL_FORMS)])
         params = {
             'potemplate': quote(self.potemplate),
             'language': quote(self.language),
             'flag': side_traits.flag_name,
             'other_flag': side_traits.other_side_traits.flag_name,
-            'has_other_msgstrs': has_other_msgstrs,
+            'has_msgstrs':
+                compose_sql_translationmessage_has_translations("Current"),
+            'has_other_msgstrs':
+                compose_sql_translationmessage_has_translations("Other"),
         }
         # The "distinct on" combined with the "order by potemplate nulls
         # last" makes diverged messages mask their shared equivalents.
@@ -857,11 +868,11 @@
                     Other.potmsgset = TTI.potmsgset AND
                     Other.language = %(language)s AND
                     Other.%(other_flag)s IS TRUE AND
-                    Other.potemplate IS NULL AND
                     Other.potemplate IS NULL
                 WHERE
                     TTI.potemplate = %(potemplate)s AND
-                    TTI.sequence > 0
+                    TTI.sequence > 0 AND
+                    %(has_msgstrs)s
                 ORDER BY
                     TTI.potmsgset,
                     Current.potemplate NULLS LAST

=== modified file 'lib/lp/translations/tests/test_pofile.py'
--- lib/lp/translations/tests/test_pofile.py	2011-02-17 22:06:02 +0000
+++ lib/lp/translations/tests/test_pofile.py	2011-02-21 13:46:28 +0000
@@ -1,4 +1,4 @@
-# Copyright 2009-2010 Canonical Ltd.  This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 __metaclass__ = type
@@ -1824,6 +1824,65 @@
         self.assertEquals(self.pofile.newCount(), 0)
         self.assertEquals(self.pofile.updatesCount(), 1)
 
+    def test_empty_messages_count_as_untranslated(self):
+        # A message with all its msgstr* set to None counts as if
+        # there's no message at all.  It doesn't show up in any of the
+        # counts except as untranslated.
+        self.factory.makeCurrentTranslationMessage(
+            pofile=self.pofile, potmsgset=self.potmsgset, translations=[])
+        self.pofile.updateStatistics()
+        self.assertEqual(0, self.pofile.translatedCount())
+        self.assertEqual(1, self.pofile.untranslatedCount())
+        self.assertEqual(0, self.pofile.newCount())
+        self.assertEqual(0, self.pofile.updatesCount())
+
+    def test_empty_messages_on_this_side_count_as_untranslated(self):
+        # A POTMsgSet whose current TranslationMessage on this side is
+        # empty is counted only as untranslated, regardless of any
+        # translations it may have on the other side.
+        self.factory.makeCurrentTranslationMessage(
+            pofile=self.pofile, potmsgset=self.potmsgset, translations=[])
+        other_message = self.factory.makeSuggestion(
+            pofile=self.pofile, potmsgset=self.potmsgset)
+        other_message.is_current_ubuntu = True
+        self.pofile.updateStatistics()
+        self.assertEqual(0, self.pofile.translatedCount())
+        self.assertEqual(1, self.pofile.untranslatedCount())
+        self.assertEqual(0, self.pofile.newCount())
+        self.assertEqual(0, self.pofile.updatesCount())
+
+    def test_empty_messages_on_other_side_count_as_untranslated(self):
+        # A POTMsgSet that's translated on this side but has an empty
+        # translation on the other side counts as translated on this
+        # side, but not equal between both sides (currentCount) or
+        # translated differently between the two sides (updatesCount).
+        # Instead, it's counted as translated on this side but not on
+        # the other (newCount).
+        self.factory.makeCurrentTranslationMessage(
+            pofile=self.pofile, potmsgset=self.potmsgset)
+        other_message = self.factory.makeSuggestion(
+            pofile=self.pofile, potmsgset=self.potmsgset, translations=[])
+        other_message.is_current_ubuntu = True
+        self.pofile.updateStatistics()
+        self.assertEqual(1, self.pofile.translatedCount())
+        self.assertEqual(0, self.pofile.untranslatedCount())
+        self.assertEqual(1, self.pofile.newCount())
+        self.assertEqual(0, self.pofile.updatesCount())
+        self.assertEqual(0, self.pofile.currentCount())
+
+    def test_tracking_empty_messages_count_as_untranslated(self):
+        # An empty TranslationMessage that's current on both sides
+        # counts as untranslated.
+        self.factory.makeCurrentTranslationMessage(
+            pofile=self.pofile, potmsgset=self.potmsgset, translations=[],
+            current_other=True)
+        self.pofile.updateStatistics()
+        self.assertEqual(0, self.pofile.translatedCount())
+        self.assertEqual(1, self.pofile.untranslatedCount())
+        self.assertEqual(0, self.pofile.newCount())
+        self.assertEqual(0, self.pofile.updatesCount())
+        self.assertEqual(0, self.pofile.currentCount())
+
 
 class TestPOFile(TestCaseWithFactory):
     """Test PO file methods."""


Follow ups