← Back to team overview

gnome-zeitgeist team mailing list archive

[Bug 525790] [NEW] thumbnail creation doesn't work for non-utf8 text files

 

Public bug reported:

When moving mouse over a text filename (actually a csv) encoded in
ISO-8859-1, thumbnail window preview crashes as
src/gio_file.py::create_text_thumb() requests a pygments.lexer which
only expects UTF-8 encoded content by default.

*********************
Traceback (most recent call last):
  File "/home/mtou/zeitgeist/gnome-activity-journal/src/widgets.py", line 562, in _handle_tooltip
    return tooltip_window.preview(self.gio_file)
  File "/home/mtou/zeitgeist/gnome-activity-journal/src/widgets.py", line 351, in preview
    pixbuf = gio_file.get_thumbnail(size=size, border=1)
  File "/home/mtou/zeitgeist/gnome-activity-journal/src/gio_file.py", line 214, in get_thumbnail
    thumb = create_text_thumb(self, size, 1)
  File "/home/mtou/zeitgeist/gnome-activity-journal/src/gio_file.py", line 127, in create_text_thumb
    content = highlight(content, lexer, formatter)
  File "/usr/lib/pymodules/python2.6/pygments/__init__.py", line 89, in highlight
    return format(lex(code, lexer), formatter, outfile)
  File "/usr/lib/pymodules/python2.6/pygments/__init__.py", line 46, in lex
    return lexer.get_tokens(code)
  File "/usr/lib/pymodules/python2.6/pygments/lexer.py", line 151, in get_tokens
    text = text.decode(self.encoding)
  File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 5564-5566: invalid data

**********
Quick fix : 
tell the lexer to try to guess text file encoding by itself (lexer.encoding = "guess" on gio_file.py, line 120).

A better one would be to use :
lexer.encoding = "chardet"
but one should check the python chardet library is available on the system (but I don't know how to do that)

**********
To reproduce behaviour, try moving mouse over an ISO-8859-1 encoded text file containing non alphabetic characters, such as the french accentuated letters (éàï or ù)

Tested on zeitgeist & GAJ installed from the PPA on ubuntu jaunty. (same
behavior expected on trunk)

** Affects: gnome-activity-journal
     Importance: Undecided
         Status: New


** Tags: encoding file iso-8859-1 text thumbnail utf-8

-- 
thumbnail creation doesn't work for non-utf8 text files
https://bugs.launchpad.net/bugs/525790
You received this bug notification because you are a member of GNOME
Zeitgeist Team, which is the registrant for GNOME Activity Journal.

Status in GNOME Activity Journal: New

Bug description:
When moving mouse over a text filename (actually a csv) encoded in ISO-8859-1, thumbnail window preview crashes as src/gio_file.py::create_text_thumb() requests a pygments.lexer which only expects UTF-8 encoded content by default.

*********************
Traceback (most recent call last):
  File "/home/mtou/zeitgeist/gnome-activity-journal/src/widgets.py", line 562, in _handle_tooltip
    return tooltip_window.preview(self.gio_file)
  File "/home/mtou/zeitgeist/gnome-activity-journal/src/widgets.py", line 351, in preview
    pixbuf = gio_file.get_thumbnail(size=size, border=1)
  File "/home/mtou/zeitgeist/gnome-activity-journal/src/gio_file.py", line 214, in get_thumbnail
    thumb = create_text_thumb(self, size, 1)
  File "/home/mtou/zeitgeist/gnome-activity-journal/src/gio_file.py", line 127, in create_text_thumb
    content = highlight(content, lexer, formatter)
  File "/usr/lib/pymodules/python2.6/pygments/__init__.py", line 89, in highlight
    return format(lex(code, lexer), formatter, outfile)
  File "/usr/lib/pymodules/python2.6/pygments/__init__.py", line 46, in lex
    return lexer.get_tokens(code)
  File "/usr/lib/pymodules/python2.6/pygments/lexer.py", line 151, in get_tokens
    text = text.decode(self.encoding)
  File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 5564-5566: invalid data

**********
Quick fix : 
tell the lexer to try to guess text file encoding by itself (lexer.encoding = "guess" on gio_file.py, line 120).

A better one would be to use :
lexer.encoding = "chardet"
but one should check the python chardet library is available on the system (but I don't know how to do that)

**********
To reproduce behaviour, try moving mouse over an ISO-8859-1 encoded text file containing non alphabetic characters, such as the french accentuated letters (éàï or ù)

Tested on zeitgeist & GAJ installed from the PPA on ubuntu jaunty. (same behavior expected on trunk)





Follow ups

References