← Back to team overview

launchpad-reviewers team mailing list archive

Re: [Merge] lp:~dpm/launchpad/translations-exporter into lp:launchpad

 

Hi David,

firstly, a few general remarks:

- The big obstacle: "thou shalt not increase the LOC count for (the branch
  you are landing code in) unless" (quoted from
  https://dev.launchpad.net/PolicyAndProcess/MaintenanceCosts?highlight=%28loc%29

  See the wiki page for more details -- you can either try to convice
  somebody in charge that this is a good change, or you can "compess" the
  existing LP code base first.

  Personally, I think this script would be worth to add, but I am not the
  project lead...


> This is a result of the request on RT #55759 to move the Ubuntu translations
> exporter script [1] into the Launchpad tree.
> 
> As it's my first ever Launchpad merge proposal, I'm submitting this as work
> in progress to get some initial feedback if overall what I'm trying to do
> and the code looks good. I'll start working on the tests once that's been
> reviewed and I'm certain I'm doing the right thing :).
> 
> In short, this code is thought to be run daily to provide a tarball with an
> export of Ubuntu translations stats. The data is fetched by querying the
> Launchpad database. The data will then be used to produce the translations
> coverage report at release time [2] and also to provide a list of of
> priority templates to help the community focus on the most important
> translations to complete, as well as producing graphs of their progress.
> 
> There are some TODO comments in the code for a couple of areas I was not
> sure how to go about. The most important one is the fact that the tarballs
> that are generated with the database dump are uploaded to Librarian, but as
> they are not listed anywhere in Launchpad, I have no way of knowing their
> URL in advance when I'll want to fetch them. I thought perhaps a URL alias
> such as the ones we use for
> https://translations.launchpad.net/ubuntu/hardy/+latest-full-language-pack
> could be useful (e.g. .../+latest-stats-export). I could not really figure
> out how to do it. Any pointers? Or other approaches?

Such a URL would mean that the Librarian data would have to be fed through
the app server in order to reach the client.

But the core problem is that a Librarian file that is not referenced from
anywhere will be deleted by a cron job. So we need a column in some LP
DB table that points to the LibraryFileAlias. I think a new column
DistroSeries.translation_statistics would do the job.

Adding this column requires a separate branch and merge proposal,
see https://dev.launchpad.net/PolicyAndProcess/DatabaseSchemaChangesProcess

This may look a bit scary, but just adding a column is not difficult.
But I suspect that the column will need an index so that the garbo job
that deletes unreferenced LFA records can run efficiently. But stub
ask stub if my suspicion is right. Anyway, the index should be added
in a separate branch.

database/schema/patch-2209-28-1.sql and database/schema/patch-2209-28-2.sql
are an example for such a DB schema change.

And once the DB column exists, you can add a new property to the Python
class DistroSeries and export it to the API.

> 
> Thanks.
> 
> [1] https://launchpad.net/lp-get-ul10nstats/
> [2] http://people.canonical.com/~dpm/stats/ubuntu-12.04-translation-stats.html
> 
> 

[...]

> === added file 'lib/lp/translations/scripts/export_stats.py'
> --- lib/lp/translations/scripts/export_stats.py	1970-01-01 00:00:00 +0000
> +++ lib/lp/translations/scripts/export_stats.py	2012-09-14 09:49:44 +0000
> @@ -0,0 +1,284 @@
> +# Copyright 2009 Canonical Ltd.  This software is licensed under the
> +# GNU Affero General Public License version 3 (see the file LICENSE).
> +
> +"""Export Ubuntu translations statistics from the database for the use of
> +external clients.
> +
> +These stats are used to produce the translations coverage report at
> +release time [1] and also to produce a list of of priority templates
> +to help the community focus on the most important translations to complete,
> +as well as producing graphs of their progress.
> +
> +[1] http://people.canonical.com/~dpm/stats/ubuntu-12.04-translation-stats.html
> +"""
> +
> +__metaclass__ = type
> +
> +__all__ = [
> +    'TranslationsStatsExporter',
> +    ]
> +
> +from lp.services.scripts.base import LaunchpadCronScript
> +from lp.services.database.sqlbase import (
> +    cursor,
> +    sqlvalues,
> +    )
> +from lp.services.librarian.interfaces.client import (
> +    ILibrarianClient,
> +    UploadFailed,
> +    )
> +from lp.services.tarfile_helpers import LaunchpadWriteTarFile
> +from zope.component import getUtility
> +from optparse import (
> +    Option,
> +    OptionValueError,
> +    )
> +from shutil import copyfileobj
> +import sys
> +import logging
> +import json
> +from tempfile import TemporaryFile
> +import datetime
> +
> +
> +# PostgreSQL queries to run against the given database to produce the desired
> +# output.
> +# If you need to add a new query, remember to add it to the QUERIES dict
> +# below
> +PACKAGES_QUERY = """
> +select
> +    sourcepackagename.name AS sourcepackage,
> +    potemplate.translation_domain,
> +    potemplate.name AS template_name,
> +    potemplate.date_last_updated AS template_last_updated,
> +    pofile.date_changed AS translation_last_updated,
> +    language.code AS language,
> +    (pofile.currentcount+pofile.rosettacount) AS translated,
> +    pofile.unreviewed_count AS unreviewed,
> +    pofile.updatescount AS changed,
> +    potemplate.messagecount AS total
> +  from potemplate
> +  join sourcepackagename on sourcepackagename=sourcepackagename.id
> +  join distroseries on distroseries=distroseries.id
> +  join pofile on pofile.potemplate=potemplate.id
> +  join language on pofile.language=language.id
> +  where
> +    (potemplate.iscurrent or sourcepackagename.name='openoffice.org') and
> +    distroseries.name='{0}'
> +  order by language.code,
> +           sourcepackagename.name,
> +           potemplate.translation_domain,
> +           date_last_updated;"""

We tend to SHOUT at the DB server, i.e., write all SQL keywords in
capitals: SELECT ... FROM ...

> +
> +POTEMPLATES_QUERY = """
> +select
> +    sourcepackagename.name AS sourcepackage,
> +    potemplate.translation_domain,
> +    potemplate.name AS template_name,sqlvalues
> +    potemplate.messagecount AS total,
> +    potemplate.iscurrent AS enabled,
> +    potemplate.languagepack,
> +    potemplate.priority,
> +    potemplate.date_last_updated
> +  from potemplate
> +  join sourcepackagename on sourcepackagename=sourcepackagename.id
> +  join distroseries on distroseries=distroseries.id
> +  where
> +    distroseries.name='{0}'
> +  order by sourcepackagename.name,
> +           potemplate.name;"""
> +
> +# This is a test query in order to obtain some results from
> +# the database from a local Launchpad installation, which
> +# otherwise returns empty results for the above queries
> +ALL_POTEMPLATES_QUERY = """
> +select
> +        potemplate.translation_domain,
> +        potemplate.name AS template_name,
> +        potemplate.messagecount AS total,
> +        potemplate.iscurrent AS enabled,
> +        potemplate.languagepack,
> +        potemplate.priority,
> +        potemplate.date_last_updated
> +    from potemplate;
> +"""
> +
> +# Add PostgreSQL queries here using the format
> +# 'output-filename': SQL_QUERY_VARIABLE
> +# Any query you add will be executed against the
> +# given database
> +QUERIES = {'package-stats': PACKAGES_QUERY,
> +           'potemplate-stats': POTEMPLATES_QUERY,
> +#           'all-potemplate-stats': ALL_POTEMPLATES_QUERY,
> +           }
> +
> +
> +class ExportQuery(object):

Since you have "__metaclass = type" at the start of the file, you
don't need to derice from object.

> +    """Class to contain a database query to be used to export translations
> +    data.
> +
> +    :param query: String that contains the database query to execute.
> +    :param query_id: String that describes the query.
> +    :param distro_codename: String to specify the distro series the query
> +        applies to (e.g. 'precise')
> +    :param output_file: String to optionally specify the output file path
> +        (or stdout) to save the query results to. Usually the results will
> +        be uploaded to the Librarian.
> +
> +    Supported storage formats for the results:
> +     - 'json': .json file containing a JSON structure
> +
> +    """
> +
> +    def __init__(self, query, query_id, distro_codename, output_file=None):
> +        self.query = query.format(sqlvalues(distro_codename))
> +        self.query_id = query_id
> +        self.distro_codename = distro_codename
> +        self.output_file = output_file
> +
> +    def run(self):
> +        """Run a database query and save the results to disk or stdout"""
> +
> +        cur = cursor()
> +        self.logger.debug("Executing query: {0}...".format(self.query))
> +        cur.execute(self.query)
> +
> +        # Fetch the descriptions to have them in the output
> +        desc = [col[0] for col in cur.description]
> +        logging.debug("Cursor description: {0}".format(desc))
> +
> +        results = cur.fetchall()
> +        self.logger.debug("Query results: {0}".format(results))
> +
> +        self.__publish_result(results, desc)
> +
> +    def __publish_result(self, results, description):

Class attributes names starting with '__' are treated a bit special:
You can't do something like

    q = ExportQuery(...)
    q.__publish_result(...)

but you'll need to do that in tests. I'd suggest to simpy remove the
leading '__'.

> +        """Publish the results of a database query to export translation data
> +        and save them to disk to a given file or to the Librarian
> +
> +        :param results: List of tuples containing the results from a
> +            translations export database query
> +        :param description: String containing the description of the query.
> +            It will be embedded in the filename of the file to save to.
> +        """
> +
> +        # Save the results into a JSON file and compress it into a tarball
> +        filehandle = TemporaryFile()
> +        archive = LaunchpadWriteTarFile(filehandle)
> +        output_string = self.__postprocess_result(results, description)
> +        archive.add_file(
> +            self.__get_output_filename('json'), output_string)
> +
> +        archive.close()
> +        size = filehandle.tell()
> +        filehandle.seek(0)
> +
> +        # Now save the tarball
> +        if self.output_file is not None:
> +            # Save the tarball to a file.
> +            if self.output_file == '-':
> +                output_filehandle = sys.stdout
> +            else:
> +                output_filehandle = file(self.output_file, 'wb')
> +
> +            copyfileobj(filehandle, output_filehandle)
> +        else:
> +            # Upload the tarball to the librarian.
> +            try:
> +                uploader = getUtility(ILibrarianClient)
> +                # For tar.gz files, the standard content type is
> +                # application/x-gtar. You can see more info on
> +                # http://en.wikipedia.org/wiki/List_of_archive_formats
> +                file_alias = uploader.addFile(
> +                    name=self.__get_output_filename('.tar.gz'),
> +                    size=size,
> +                    file=filehandle,
> +                    contentType='application/x-gtar')
> +            except UploadFailed as e:
> +                self.logger.error('Uploading to the Librarian failed: %s', e)
> +                return None
> +            except:

Completely unconditional except clauses are bad: You don't get any clue
what was causing the exception. Just remove it: LPCronScript has some
machinery to log exceptions.

> +                # Bare except statements are used in order to prevent premature
> +                # termination of the script.
> +                self.logger.exception(
> +                    'Uncaught exception while uploading to the Librarian')
> +                return None
> +
> +            # TODO: register upload as 'latest-translations-stats-export'
> +            # TODO: delete old files

As written above, you don't need to explicity delete unreferenced LFAs.

> +            self.logger.debug('Upload complete, file alias: %d' % file_alias)
> +
> +    def __get_output_filename(self, extension):
> +        """Builds and returns the full name of the output file to be saved to
> +        disk.
> +
> +        :param extension: String to specify the filename extension
> +        :return: String with the name of the output file
> +        """
> +        return(self.distro_codename + '-' + self.query_id + '.' + extension)
> +
> +    def __postprocess_result(self, results, description):
> +        """Process the results of a translations database export query and
> +        convert them to machine-readable formats.
> +
> +        Currently only JSON is supported.
> +
> +        :param results: List of tuples containing the results from a
> +            translations export database query
> +        :param description: String containing the description of the query.
> +            It will be embedded in the filename of the file to save to.
> +        :return: String with the converted results
> +        """
> +
> +        dthandler = lambda obj: obj.isoformat() \
> +            if isinstance(obj, datetime.datetime) else None
> +        data = [dict(zip(description, prop)) for prop in results]
> +
> +        output_string = json.dumps(data, default=dthandler)
> +
> +        return output_string
> +
> +
> +class TranslationsStatsExporter(LaunchpadCronScript):
> +    """Execute the required database queries to obtain a raw database dump
> +    in text files of a given format (currently JSON), containing statistics
> +    about translation coverage for all the languages and translatable
> +    packages under the Ubuntu project in Launchpad.
> +    """
> +
> +    description = "Export Ubuntu translations stats from the database."
> +    loglevel = logging.INFO
> +
> +    my_options = [
> +        Option(
> +            '-d', '--distro-codename', dest='distro_codename',
> +            default=None, action='store',
> +            help="The distro to get stats for (e.g. 'precise')"),
> +        Option(
> +            '-o', '--output',
> +            dest='output',
> +            default=None,
> +            action='store',
> +            help='A file to send the generated tarball to, rather than the'
> +                 ' Libraran.'
> +            )
> +        ]
> +
> +    def add_my_options(self):
> +        """See `LaunchpadScript`."""
> +        self.parser.add_options(self.my_options)
> +
> +    def main(self):
> +        if not self.options.distro_codename:
> +            raise OptionValueError("A distro codename needs to be specified")
> +
> +        # TODO: validate distro codename

See lib/lp/registry/doc/distroseries.txt:

    >>> distroseriesset = getUtility(IDistroSeriesSet)

    [...]

    >>> for distroseries in distroseriesset.findByName("warty"):
    ...     print distroseries.name
    warty


> +
> +        # Set up and run all specified queries
> +        for (query_id, query) in QUERIES.items():
> +            export_query = ExportQuery(query,
> +                                       query_id,
> +                                       self.options.distro_codename,
> +                                       self.options.output_file,
> +                                       )
> +            export_query.run()
> 
> 

-- 
https://code.launchpad.net/~dpm/launchpad/translations-exporter/+merge/124373
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~dpm/launchpad/translations-exporter into lp:launchpad.


Follow ups

References