← Back to team overview

touch-packages team mailing list archive

[Bug 1384857] [NEW] media scanner does not handle files with incorrectly encoded tags (mojibake)

 

You have been subscribed to a public bug:

I don't really have enough information to know how prevalent this
problem is, since it seems to be highly dependent on region.  I was
shown a Chinese user's phone where half the songs came up with garbage
metadata.  It seems that the problem is that the metadata in these files
is tagged as ISO-8859-1, but is actually in the locale's legacy encoding
(GBK in the case of these Chinese tracks).

It is not clear whether we can easily fix this in media scanner though,
since GStreamer is providing tag data to us normalised to UTF-8.  To
unmangle the text, I needed to convert this UTF-8 to ISO-8859-1, and
then convert that back to UTF-8 as if it was GBK.

GStreamer already includes some code to attempt to decode text according
to the locale's encoding, but since we are using UTF-8 locales this
doesn't do anything:

http://cgit.freedesktop.org/gstreamer/gst-plugins-base/tree/gst-
libs/gst/tag/id3v2frames.c#n968

There is also an open upstream bug about guessing at a legacy encoding
based on the the locale, but it hasn't seen any activity in a year:

https://bugzilla.gnome.org/show_bug.cgi?id=688367

** Affects: mediascanner2 (Ubuntu)
     Importance: Undecided
         Status: New

-- 
media scanner does not handle files with incorrectly encoded tags (mojibake)
https://bugs.launchpad.net/bugs/1384857
You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to mediascanner2 in Ubuntu.