launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #11996
[Merge] lp:~dpm/launchpad/translations-exporter into lp:launchpad
David Planella has proposed merging lp:~dpm/launchpad/translations-exporter into lp:launchpad.
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
For more details, see:
https://code.launchpad.net/~dpm/launchpad/translations-exporter/+merge/124373
This is a result of the request on RT #55759 to move the Ubuntu translations exporter script [1] into the Launchpad tree.
As it's my first ever Launchpad merge proposal, I'm submitting this as work in progress to get some initial feedback if overall what I'm trying to do and the code looks good. I'll start working on the tests once that's been reviewed and I'm certain I'm doing the right thing :).
In short, this code is thought to be run daily to provide a tarball with an export of Ubuntu translations stats. The data is fetched by querying the Launchpad database. The data will then be used to produce the translations coverage report at release time [2] and also to provide a list of of priority templates to help the community focus on the most important translations to complete, as well as producing graphs of their progress.
There are some TODO comments in the code for a couple of areas I was not sure how to go about. The most important one is the fact that the tarballs that are generated with the database dump are uploaded to Librarian, but as they are not listed anywhere in Launchpad, I have no way of knowing their URL in advance when I'll want to fetch them. I thought perhaps a URL alias such as the ones we use for https://translations.launchpad.net/ubuntu/hardy/+latest-full-language-pack could be useful (e.g. .../+latest-stats-export). I could not really figure out how to do it. Any pointers? Or other approaches?
Thanks.
[1] https://launchpad.net/lp-get-ul10nstats/
[2] http://people.canonical.com/~dpm/stats/ubuntu-12.04-translation-stats.html
--
https://code.launchpad.net/~dpm/launchpad/translations-exporter/+merge/124373
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~dpm/launchpad/translations-exporter into lp:launchpad.
=== added file 'cronscripts/translations-export-stats.py'
--- cronscripts/translations-export-stats.py 1970-01-01 00:00:00 +0000
+++ cronscripts/translations-export-stats.py 2012-09-14 09:49:44 +0000
@@ -0,0 +1,25 @@
+#!/usr/bin/python -S
+#
+# Copyright 2009 Canonical Ltd. This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+# pylint: disable-msg=C0103,W0403
+
+"""Export translations stats"""
+
+__metaclass__ = type
+__all__ = []
+
+import _pythonpath
+
+from lp.translations.scripts.export_stats import (
+ TranslationsStatsExporter,
+ )
+from lp.services.config import config
+
+
+if __name__ == '__main__':
+ script = TranslationsStatsExporter(
+ 'translations-export-stats',
+ dbuser=config.translationsstatsexporter.dbuser)
+ script.lock_and_run()
=== modified file 'lib/lp/services/config/schema-lazr.conf'
--- lib/lp/services/config/schema-lazr.conf 2012-08-09 04:44:13 +0000
+++ lib/lp/services/config/schema-lazr.conf 2012-09-14 09:49:44 +0000
@@ -1610,6 +1610,12 @@
dbuser: targetnamecacheupdater
+[translationsstatsexporter]
+# The database user which will be used by this process.
+# datatype: string
+dbuser: ubuntu_localisation_statistics
+
+
[updateremoteproduct]
# The database user to run this process as.
# datatype: string
=== added file 'lib/lp/translations/scripts/export_stats.py'
--- lib/lp/translations/scripts/export_stats.py 1970-01-01 00:00:00 +0000
+++ lib/lp/translations/scripts/export_stats.py 2012-09-14 09:49:44 +0000
@@ -0,0 +1,284 @@
+# Copyright 2009 Canonical Ltd. This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Export Ubuntu translations statistics from the database for the use of
+external clients.
+
+These stats are used to produce the translations coverage report at
+release time [1] and also to produce a list of of priority templates
+to help the community focus on the most important translations to complete,
+as well as producing graphs of their progress.
+
+[1] http://people.canonical.com/~dpm/stats/ubuntu-12.04-translation-stats.html
+"""
+
+__metaclass__ = type
+
+__all__ = [
+ 'TranslationsStatsExporter',
+ ]
+
+from lp.services.scripts.base import LaunchpadCronScript
+from lp.services.database.sqlbase import (
+ cursor,
+ sqlvalues,
+ )
+from lp.services.librarian.interfaces.client import (
+ ILibrarianClient,
+ UploadFailed,
+ )
+from lp.services.tarfile_helpers import LaunchpadWriteTarFile
+from zope.component import getUtility
+from optparse import (
+ Option,
+ OptionValueError,
+ )
+from shutil import copyfileobj
+import sys
+import logging
+import json
+from tempfile import TemporaryFile
+import datetime
+
+
+# PostgreSQL queries to run against the given database to produce the desired
+# output.
+# If you need to add a new query, remember to add it to the QUERIES dict
+# below
+PACKAGES_QUERY = """
+select
+ sourcepackagename.name AS sourcepackage,
+ potemplate.translation_domain,
+ potemplate.name AS template_name,
+ potemplate.date_last_updated AS template_last_updated,
+ pofile.date_changed AS translation_last_updated,
+ language.code AS language,
+ (pofile.currentcount+pofile.rosettacount) AS translated,
+ pofile.unreviewed_count AS unreviewed,
+ pofile.updatescount AS changed,
+ potemplate.messagecount AS total
+ from potemplate
+ join sourcepackagename on sourcepackagename=sourcepackagename.id
+ join distroseries on distroseries=distroseries.id
+ join pofile on pofile.potemplate=potemplate.id
+ join language on pofile.language=language.id
+ where
+ (potemplate.iscurrent or sourcepackagename.name='openoffice.org') and
+ distroseries.name='{0}'
+ order by language.code,
+ sourcepackagename.name,
+ potemplate.translation_domain,
+ date_last_updated;"""
+
+POTEMPLATES_QUERY = """
+select
+ sourcepackagename.name AS sourcepackage,
+ potemplate.translation_domain,
+ potemplate.name AS template_name,sqlvalues
+ potemplate.messagecount AS total,
+ potemplate.iscurrent AS enabled,
+ potemplate.languagepack,
+ potemplate.priority,
+ potemplate.date_last_updated
+ from potemplate
+ join sourcepackagename on sourcepackagename=sourcepackagename.id
+ join distroseries on distroseries=distroseries.id
+ where
+ distroseries.name='{0}'
+ order by sourcepackagename.name,
+ potemplate.name;"""
+
+# This is a test query in order to obtain some results from
+# the database from a local Launchpad installation, which
+# otherwise returns empty results for the above queries
+ALL_POTEMPLATES_QUERY = """
+select
+ potemplate.translation_domain,
+ potemplate.name AS template_name,
+ potemplate.messagecount AS total,
+ potemplate.iscurrent AS enabled,
+ potemplate.languagepack,
+ potemplate.priority,
+ potemplate.date_last_updated
+ from potemplate;
+"""
+
+# Add PostgreSQL queries here using the format
+# 'output-filename': SQL_QUERY_VARIABLE
+# Any query you add will be executed against the
+# given database
+QUERIES = {'package-stats': PACKAGES_QUERY,
+ 'potemplate-stats': POTEMPLATES_QUERY,
+# 'all-potemplate-stats': ALL_POTEMPLATES_QUERY,
+ }
+
+
+class ExportQuery(object):
+ """Class to contain a database query to be used to export translations
+ data.
+
+ :param query: String that contains the database query to execute.
+ :param query_id: String that describes the query.
+ :param distro_codename: String to specify the distro series the query
+ applies to (e.g. 'precise')
+ :param output_file: String to optionally specify the output file path
+ (or stdout) to save the query results to. Usually the results will
+ be uploaded to the Librarian.
+
+ Supported storage formats for the results:
+ - 'json': .json file containing a JSON structure
+
+ """
+
+ def __init__(self, query, query_id, distro_codename, output_file=None):
+ self.query = query.format(sqlvalues(distro_codename))
+ self.query_id = query_id
+ self.distro_codename = distro_codename
+ self.output_file = output_file
+
+ def run(self):
+ """Run a database query and save the results to disk or stdout"""
+
+ cur = cursor()
+ self.logger.debug("Executing query: {0}...".format(self.query))
+ cur.execute(self.query)
+
+ # Fetch the descriptions to have them in the output
+ desc = [col[0] for col in cur.description]
+ logging.debug("Cursor description: {0}".format(desc))
+
+ results = cur.fetchall()
+ self.logger.debug("Query results: {0}".format(results))
+
+ self.__publish_result(results, desc)
+
+ def __publish_result(self, results, description):
+ """Publish the results of a database query to export translation data
+ and save them to disk to a given file or to the Librarian
+
+ :param results: List of tuples containing the results from a
+ translations export database query
+ :param description: String containing the description of the query.
+ It will be embedded in the filename of the file to save to.
+ """
+
+ # Save the results into a JSON file and compress it into a tarball
+ filehandle = TemporaryFile()
+ archive = LaunchpadWriteTarFile(filehandle)
+ output_string = self.__postprocess_result(results, description)
+ archive.add_file(
+ self.__get_output_filename('json'), output_string)
+
+ archive.close()
+ size = filehandle.tell()
+ filehandle.seek(0)
+
+ # Now save the tarball
+ if self.output_file is not None:
+ # Save the tarball to a file.
+ if self.output_file == '-':
+ output_filehandle = sys.stdout
+ else:
+ output_filehandle = file(self.output_file, 'wb')
+
+ copyfileobj(filehandle, output_filehandle)
+ else:
+ # Upload the tarball to the librarian.
+ try:
+ uploader = getUtility(ILibrarianClient)
+ # For tar.gz files, the standard content type is
+ # application/x-gtar. You can see more info on
+ # http://en.wikipedia.org/wiki/List_of_archive_formats
+ file_alias = uploader.addFile(
+ name=self.__get_output_filename('.tar.gz'),
+ size=size,
+ file=filehandle,
+ contentType='application/x-gtar')
+ except UploadFailed as e:
+ self.logger.error('Uploading to the Librarian failed: %s', e)
+ return None
+ except:
+ # Bare except statements are used in order to prevent premature
+ # termination of the script.
+ self.logger.exception(
+ 'Uncaught exception while uploading to the Librarian')
+ return None
+
+ # TODO: register upload as 'latest-translations-stats-export'
+ # TODO: delete old files
+ self.logger.debug('Upload complete, file alias: %d' % file_alias)
+
+ def __get_output_filename(self, extension):
+ """Builds and returns the full name of the output file to be saved to
+ disk.
+
+ :param extension: String to specify the filename extension
+ :return: String with the name of the output file
+ """
+ return(self.distro_codename + '-' + self.query_id + '.' + extension)
+
+ def __postprocess_result(self, results, description):
+ """Process the results of a translations database export query and
+ convert them to machine-readable formats.
+
+ Currently only JSON is supported.
+
+ :param results: List of tuples containing the results from a
+ translations export database query
+ :param description: String containing the description of the query.
+ It will be embedded in the filename of the file to save to.
+ :return: String with the converted results
+ """
+
+ dthandler = lambda obj: obj.isoformat() \
+ if isinstance(obj, datetime.datetime) else None
+ data = [dict(zip(description, prop)) for prop in results]
+
+ output_string = json.dumps(data, default=dthandler)
+
+ return output_string
+
+
+class TranslationsStatsExporter(LaunchpadCronScript):
+ """Execute the required database queries to obtain a raw database dump
+ in text files of a given format (currently JSON), containing statistics
+ about translation coverage for all the languages and translatable
+ packages under the Ubuntu project in Launchpad.
+ """
+
+ description = "Export Ubuntu translations stats from the database."
+ loglevel = logging.INFO
+
+ my_options = [
+ Option(
+ '-d', '--distro-codename', dest='distro_codename',
+ default=None, action='store',
+ help="The distro to get stats for (e.g. 'precise')"),
+ Option(
+ '-o', '--output',
+ dest='output',
+ default=None,
+ action='store',
+ help='A file to send the generated tarball to, rather than the'
+ ' Libraran.'
+ )
+ ]
+
+ def add_my_options(self):
+ """See `LaunchpadScript`."""
+ self.parser.add_options(self.my_options)
+
+ def main(self):
+ if not self.options.distro_codename:
+ raise OptionValueError("A distro codename needs to be specified")
+
+ # TODO: validate distro codename
+
+ # Set up and run all specified queries
+ for (query_id, query) in QUERIES.items():
+ export_query = ExportQuery(query,
+ query_id,
+ self.options.distro_codename,
+ self.options.output_file,
+ )
+ export_query.run()
Follow ups