launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #11999
Re: [Merge] lp:~dpm/launchpad/translations-exporter into lp:launchpad
Al 14/09/12 13:38, En/na Abel Deuring ha escrit:
> Hi David,
>
> firstly, a few general remarks:
>
> - The big obstacle: "thou shalt not increase the LOC count for (the branch
> you are landing code in) unless" (quoted from
> https://dev.launchpad.net/PolicyAndProcess/MaintenanceCosts?highlight=%28loc%29
>
> See the wiki page for more details -- you can either try to convice
> somebody in charge that this is a good change, or you can "compess" the
> existing LP code base first.
>
> Personally, I think this script would be worth to add, but I am not the
> project lead...
>
Thanks for the heads up. As per the conversation just now on
#launchpad-dev, a waiver has been granted to move this code into LP:
<lifeless> I'll grant a waiver
this code already exists in the wrong place
its not adding debt to move it into LP
<lifeless> its reducing debt by getting it into the right place.
http://irclogs.ubuntu.com/2012/09/14/%23launchpad-dev.html#t11:54
I'll apply fixes for all the rest of points mentioned and will talk to
stub to double-check on the database changes you're describing, as
suggested.
Thanks!
Cheers,
David.
>
>> This is a result of the request on RT #55759 to move the Ubuntu translations
>> exporter script [1] into the Launchpad tree.
>>
>> As it's my first ever Launchpad merge proposal, I'm submitting this as work
>> in progress to get some initial feedback if overall what I'm trying to do
>> and the code looks good. I'll start working on the tests once that's been
>> reviewed and I'm certain I'm doing the right thing :).
>>
>> In short, this code is thought to be run daily to provide a tarball with an
>> export of Ubuntu translations stats. The data is fetched by querying the
>> Launchpad database. The data will then be used to produce the translations
>> coverage report at release time [2] and also to provide a list of of
>> priority templates to help the community focus on the most important
>> translations to complete, as well as producing graphs of their progress.
>>
>> There are some TODO comments in the code for a couple of areas I was not
>> sure how to go about. The most important one is the fact that the tarballs
>> that are generated with the database dump are uploaded to Librarian, but as
>> they are not listed anywhere in Launchpad, I have no way of knowing their
>> URL in advance when I'll want to fetch them. I thought perhaps a URL alias
>> such as the ones we use for
>> https://translations.launchpad.net/ubuntu/hardy/+latest-full-language-pack
>> could be useful (e.g. .../+latest-stats-export). I could not really figure
>> out how to do it. Any pointers? Or other approaches?
>
> Such a URL would mean that the Librarian data would have to be fed through
> the app server in order to reach the client.
>
> But the core problem is that a Librarian file that is not referenced from
> anywhere will be deleted by a cron job. So we need a column in some LP
> DB table that points to the LibraryFileAlias. I think a new column
> DistroSeries.translation_statistics would do the job.
>
> Adding this column requires a separate branch and merge proposal,
> see https://dev.launchpad.net/PolicyAndProcess/DatabaseSchemaChangesProcess
>
> This may look a bit scary, but just adding a column is not difficult.
> But I suspect that the column will need an index so that the garbo job
> that deletes unreferenced LFA records can run efficiently. But stub
> ask stub if my suspicion is right. Anyway, the index should be added
> in a separate branch.
>
> database/schema/patch-2209-28-1.sql and database/schema/patch-2209-28-2.sql
> are an example for such a DB schema change.
>
> And once the DB column exists, you can add a new property to the Python
> class DistroSeries and export it to the API.
>
>>
>> Thanks.
>>
>> [1] https://launchpad.net/lp-get-ul10nstats/
>> [2] http://people.canonical.com/~dpm/stats/ubuntu-12.04-translation-stats.html
>>
>>
>
> [...]
>
>> === added file 'lib/lp/translations/scripts/export_stats.py'
>> --- lib/lp/translations/scripts/export_stats.py 1970-01-01 00:00:00 +0000
>> +++ lib/lp/translations/scripts/export_stats.py 2012-09-14 09:49:44 +0000
>> @@ -0,0 +1,284 @@
>> +# Copyright 2009 Canonical Ltd. This software is licensed under the
>> +# GNU Affero General Public License version 3 (see the file LICENSE).
>> +
>> +"""Export Ubuntu translations statistics from the database for the use of
>> +external clients.
>> +
>> +These stats are used to produce the translations coverage report at
>> +release time [1] and also to produce a list of of priority templates
>> +to help the community focus on the most important translations to complete,
>> +as well as producing graphs of their progress.
>> +
>> +[1] http://people.canonical.com/~dpm/stats/ubuntu-12.04-translation-stats.html
>> +"""
>> +
>> +__metaclass__ = type
>> +
>> +__all__ = [
>> + 'TranslationsStatsExporter',
>> + ]
>> +
>> +from lp.services.scripts.base import LaunchpadCronScript
>> +from lp.services.database.sqlbase import (
>> + cursor,
>> + sqlvalues,
>> + )
>> +from lp.services.librarian.interfaces.client import (
>> + ILibrarianClient,
>> + UploadFailed,
>> + )
>> +from lp.services.tarfile_helpers import LaunchpadWriteTarFile
>> +from zope.component import getUtility
>> +from optparse import (
>> + Option,
>> + OptionValueError,
>> + )
>> +from shutil import copyfileobj
>> +import sys
>> +import logging
>> +import json
>> +from tempfile import TemporaryFile
>> +import datetime
>> +
>> +
>> +# PostgreSQL queries to run against the given database to produce the desired
>> +# output.
>> +# If you need to add a new query, remember to add it to the QUERIES dict
>> +# below
>> +PACKAGES_QUERY = """
>> +select
>> + sourcepackagename.name AS sourcepackage,
>> + potemplate.translation_domain,
>> + potemplate.name AS template_name,
>> + potemplate.date_last_updated AS template_last_updated,
>> + pofile.date_changed AS translation_last_updated,
>> + language.code AS language,
>> + (pofile.currentcount+pofile.rosettacount) AS translated,
>> + pofile.unreviewed_count AS unreviewed,
>> + pofile.updatescount AS changed,
>> + potemplate.messagecount AS total
>> + from potemplate
>> + join sourcepackagename on sourcepackagename=sourcepackagename.id
>> + join distroseries on distroseries=distroseries.id
>> + join pofile on pofile.potemplate=potemplate.id
>> + join language on pofile.language=language.id
>> + where
>> + (potemplate.iscurrent or sourcepackagename.name='openoffice.org') and
>> + distroseries.name='{0}'
>> + order by language.code,
>> + sourcepackagename.name,
>> + potemplate.translation_domain,
>> + date_last_updated;"""
>
> We tend to SHOUT at the DB server, i.e., write all SQL keywords in
> capitals: SELECT ... FROM ...
>
>> +
>> +POTEMPLATES_QUERY = """
>> +select
>> + sourcepackagename.name AS sourcepackage,
>> + potemplate.translation_domain,
>> + potemplate.name AS template_name,sqlvalues
>> + potemplate.messagecount AS total,
>> + potemplate.iscurrent AS enabled,
>> + potemplate.languagepack,
>> + potemplate.priority,
>> + potemplate.date_last_updated
>> + from potemplate
>> + join sourcepackagename on sourcepackagename=sourcepackagename.id
>> + join distroseries on distroseries=distroseries.id
>> + where
>> + distroseries.name='{0}'
>> + order by sourcepackagename.name,
>> + potemplate.name;"""
>> +
>> +# This is a test query in order to obtain some results from
>> +# the database from a local Launchpad installation, which
>> +# otherwise returns empty results for the above queries
>> +ALL_POTEMPLATES_QUERY = """
>> +select
>> + potemplate.translation_domain,
>> + potemplate.name AS template_name,
>> + potemplate.messagecount AS total,
>> + potemplate.iscurrent AS enabled,
>> + potemplate.languagepack,
>> + potemplate.priority,
>> + potemplate.date_last_updated
>> + from potemplate;
>> +"""
>> +
>> +# Add PostgreSQL queries here using the format
>> +# 'output-filename': SQL_QUERY_VARIABLE
>> +# Any query you add will be executed against the
>> +# given database
>> +QUERIES = {'package-stats': PACKAGES_QUERY,
>> + 'potemplate-stats': POTEMPLATES_QUERY,
>> +# 'all-potemplate-stats': ALL_POTEMPLATES_QUERY,
>> + }
>> +
>> +
>> +class ExportQuery(object):
>
> Since you have "__metaclass = type" at the start of the file, you
> don't need to derice from object.
>
>> + """Class to contain a database query to be used to export translations
>> + data.
>> +
>> + :param query: String that contains the database query to execute.
>> + :param query_id: String that describes the query.
>> + :param distro_codename: String to specify the distro series the query
>> + applies to (e.g. 'precise')
>> + :param output_file: String to optionally specify the output file path
>> + (or stdout) to save the query results to. Usually the results will
>> + be uploaded to the Librarian.
>> +
>> + Supported storage formats for the results:
>> + - 'json': .json file containing a JSON structure
>> +
>> + """
>> +
>> + def __init__(self, query, query_id, distro_codename, output_file=None):
>> + self.query = query.format(sqlvalues(distro_codename))
>> + self.query_id = query_id
>> + self.distro_codename = distro_codename
>> + self.output_file = output_file
>> +
>> + def run(self):
>> + """Run a database query and save the results to disk or stdout"""
>> +
>> + cur = cursor()
>> + self.logger.debug("Executing query: {0}...".format(self.query))
>> + cur.execute(self.query)
>> +
>> + # Fetch the descriptions to have them in the output
>> + desc = [col[0] for col in cur.description]
>> + logging.debug("Cursor description: {0}".format(desc))
>> +
>> + results = cur.fetchall()
>> + self.logger.debug("Query results: {0}".format(results))
>> +
>> + self.__publish_result(results, desc)
>> +
>> + def __publish_result(self, results, description):
>
> Class attributes names starting with '__' are treated a bit special:
> You can't do something like
>
> q = ExportQuery(...)
> q.__publish_result(...)
>
> but you'll need to do that in tests. I'd suggest to simpy remove the
> leading '__'.
>
>> + """Publish the results of a database query to export translation data
>> + and save them to disk to a given file or to the Librarian
>> +
>> + :param results: List of tuples containing the results from a
>> + translations export database query
>> + :param description: String containing the description of the query.
>> + It will be embedded in the filename of the file to save to.
>> + """
>> +
>> + # Save the results into a JSON file and compress it into a tarball
>> + filehandle = TemporaryFile()
>> + archive = LaunchpadWriteTarFile(filehandle)
>> + output_string = self.__postprocess_result(results, description)
>> + archive.add_file(
>> + self.__get_output_filename('json'), output_string)
>> +
>> + archive.close()
>> + size = filehandle.tell()
>> + filehandle.seek(0)
>> +
>> + # Now save the tarball
>> + if self.output_file is not None:
>> + # Save the tarball to a file.
>> + if self.output_file == '-':
>> + output_filehandle = sys.stdout
>> + else:
>> + output_filehandle = file(self.output_file, 'wb')
>> +
>> + copyfileobj(filehandle, output_filehandle)
>> + else:
>> + # Upload the tarball to the librarian.
>> + try:
>> + uploader = getUtility(ILibrarianClient)
>> + # For tar.gz files, the standard content type is
>> + # application/x-gtar. You can see more info on
>> + # http://en.wikipedia.org/wiki/List_of_archive_formats
>> + file_alias = uploader.addFile(
>> + name=self.__get_output_filename('.tar.gz'),
>> + size=size,
>> + file=filehandle,
>> + contentType='application/x-gtar')
>> + except UploadFailed as e:
>> + self.logger.error('Uploading to the Librarian failed: %s', e)
>> + return None
>> + except:
>
> Completely unconditional except clauses are bad: You don't get any clue
> what was causing the exception. Just remove it: LPCronScript has some
> machinery to log exceptions.
>
>> + # Bare except statements are used in order to prevent premature
>> + # termination of the script.
>> + self.logger.exception(
>> + 'Uncaught exception while uploading to the Librarian')
>> + return None
>> +
>> + # TODO: register upload as 'latest-translations-stats-export'
>> + # TODO: delete old files
>
> As written above, you don't need to explicity delete unreferenced LFAs.
>
>> + self.logger.debug('Upload complete, file alias: %d' % file_alias)
>> +
>> + def __get_output_filename(self, extension):
>> + """Builds and returns the full name of the output file to be saved to
>> + disk.
>> +
>> + :param extension: String to specify the filename extension
>> + :return: String with the name of the output file
>> + """
>> + return(self.distro_codename + '-' + self.query_id + '.' + extension)
>> +
>> + def __postprocess_result(self, results, description):
>> + """Process the results of a translations database export query and
>> + convert them to machine-readable formats.
>> +
>> + Currently only JSON is supported.
>> +
>> + :param results: List of tuples containing the results from a
>> + translations export database query
>> + :param description: String containing the description of the query.
>> + It will be embedded in the filename of the file to save to.
>> + :return: String with the converted results
>> + """
>> +
>> + dthandler = lambda obj: obj.isoformat() \
>> + if isinstance(obj, datetime.datetime) else None
>> + data = [dict(zip(description, prop)) for prop in results]
>> +
>> + output_string = json.dumps(data, default=dthandler)
>> +
>> + return output_string
>> +
>> +
>> +class TranslationsStatsExporter(LaunchpadCronScript):
>> + """Execute the required database queries to obtain a raw database dump
>> + in text files of a given format (currently JSON), containing statistics
>> + about translation coverage for all the languages and translatable
>> + packages under the Ubuntu project in Launchpad.
>> + """
>> +
>> + description = "Export Ubuntu translations stats from the database."
>> + loglevel = logging.INFO
>> +
>> + my_options = [
>> + Option(
>> + '-d', '--distro-codename', dest='distro_codename',
>> + default=None, action='store',
>> + help="The distro to get stats for (e.g. 'precise')"),
>> + Option(
>> + '-o', '--output',
>> + dest='output',
>> + default=None,
>> + action='store',
>> + help='A file to send the generated tarball to, rather than the'
>> + ' Libraran.'
>> + )
>> + ]
>> +
>> + def add_my_options(self):
>> + """See `LaunchpadScript`."""
>> + self.parser.add_options(self.my_options)
>> +
>> + def main(self):
>> + if not self.options.distro_codename:
>> + raise OptionValueError("A distro codename needs to be specified")
>> +
>> + # TODO: validate distro codename
>
> See lib/lp/registry/doc/distroseries.txt:
>
> >>> distroseriesset = getUtility(IDistroSeriesSet)
>
> [...]
>
> >>> for distroseries in distroseriesset.findByName("warty"):
> ... print distroseries.name
> warty
>
>
>> +
>> + # Set up and run all specified queries
>> + for (query_id, query) in QUERIES.items():
>> + export_query = ExportQuery(query,
>> + query_id,
>> + self.options.distro_codename,
>> + self.options.output_file,
>> + )
>> + export_query.run()
>>
>>
>
--
David Planella
Ubuntu Translations Coordinator
www.ubuntu.com / www.davidplanella.wordpress.com
www.identi.ca/dplanella / www.twitter.com/dplanella
https://code.launchpad.net/~dpm/launchpad/translations-exporter/+merge/124373
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~dpm/launchpad/translations-exporter into lp:launchpad.
References