kicad-developers team mailing list archive

Thread
Date

Translation Quality

To: KiCad Developers <kicad-developers@xxxxxxxxxxxxxxxxxxx>
From: Simon Richter <Simon.Richter@xxxxxxxxxx>
Date: Sun, 27 May 2018 17:57:53 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

Hi,

it seems we have a translation quality problem for some languages, as
gettext's default behaviour for new strings is to copy a string with a
similar msgid and mark the result as "fuzzy".

This is useful when only small changes occur, but needs a lot of manual
oversight. As an example, in Bulgarian we have

msgid "Green 1"
msgstr "Гръцки"

msgid "Green 2"
msgstr "Гръцки"

msgid "Green 3"
msgstr "Гръцки"

msgid "Green 4"
msgstr "Гръцки"

All of these were copied from the closest match

msgid "Greek"
msgstr "Гръцки"

We probably have a lot of similar cases lurking around, so I've written
a small perl script (attached) that creates a map from translated to
untranslated string and outputs those where multiple different strings
have the same translation. There is often a good reason for that (e.g.
"inch" vs "inches"), but sometimes there isn't.

The script is not perfect, as it doesn't handle multiline translations,
but it should give you a good overview nonetheless.

As far as I can see, all of these broken translations are marked as
fuzzy, so finding them is easy, but these are worse than useless for the
users.

   Simon

#! /usr/bin/perl

my %x;
my $id;

while(<>)
{
	chomp;
	if(/msgid /)
	{
		$id = $_;
	}
	elsif(/msgstr /)
	{
		push @{ $x{$_} }, $id;
	}
}

foreach $str (keys %x)
{
	my @values = @{ $x{$str} };
	if($#values > 1)
	{
		print $str . "\n";
		foreach $id (@values)
		{
			print "  " . $id . "\n";
		}
	}
}

Follow ups

Re: Translation Quality
From: Marco Ciampa, 2018-05-27