← Back to team overview

touch-packages team mailing list archive

[Bug 1384857] Re: media scanner does not handle files with incorrectly encoded tags (mojibake)

 

I don't believe that this is a legitimate bug. The ID3 spec requires the
encoding to be one of following:

$00 – ISO-8859-1 (LATIN-1, Identical to ASCII for values smaller than 0x80).
$01 – UCS-2 (UTF-16 encoded Unicode with BOM), in ID3v2.2 and ID3v2.3.
$02 – UTF-16BE encoded Unicode without BOM, in ID3v2.4.
$03 – UTF-8 encoded Unicode, in ID3v2.4.

It's  illegal to write GBK into ID3 tags, and I don't think we should
make any attempt to perpetuate this error.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mediascanner2 in Ubuntu.
https://bugs.launchpad.net/bugs/1384857

Title:
  media scanner does not handle files with incorrectly encoded tags
  (mojibake)

Status in mediascanner2 package in Ubuntu:
  New

Bug description:
  I don't really have enough information to know how prevalent this
  problem is, since it seems to be highly dependent on region.  I was
  shown a Chinese user's phone where half the songs came up with garbage
  metadata.  It seems that the problem is that the metadata in these
  files is tagged as ISO-8859-1, but is actually in the locale's legacy
  encoding (GBK in the case of these Chinese tracks).

  It is not clear whether we can easily fix this in media scanner
  though, since GStreamer is providing tag data to us normalised to
  UTF-8.  To unmangle the text, I needed to convert this UTF-8 to
  ISO-8859-1, and then convert that back to UTF-8 as if it was GBK.

  GStreamer already includes some code to attempt to decode text
  according to the locale's encoding, but since we are using UTF-8
  locales this doesn't do anything:

  http://cgit.freedesktop.org/gstreamer/gst-plugins-base/tree/gst-
  libs/gst/tag/id3v2frames.c#n968

  There is also an open upstream bug about guessing at a legacy encoding
  based on the the locale, but it hasn't seen any activity in a year:

  https://bugzilla.gnome.org/show_bug.cgi?id=688367

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mediascanner2/+bug/1384857/+subscriptions