gnome-zeitgeist team mailing list archive
-
gnome-zeitgeist team
-
Mailing list archive
-
Message #00429
[Bug 525790] [NEW] thumbnail creation doesn't work for non-utf8 text files
Public bug reported:
When moving mouse over a text filename (actually a csv) encoded in
ISO-8859-1, thumbnail window preview crashes as
src/gio_file.py::create_text_thumb() requests a pygments.lexer which
only expects UTF-8 encoded content by default.
*********************
Traceback (most recent call last):
File "/home/mtou/zeitgeist/gnome-activity-journal/src/widgets.py", line 562, in _handle_tooltip
return tooltip_window.preview(self.gio_file)
File "/home/mtou/zeitgeist/gnome-activity-journal/src/widgets.py", line 351, in preview
pixbuf = gio_file.get_thumbnail(size=size, border=1)
File "/home/mtou/zeitgeist/gnome-activity-journal/src/gio_file.py", line 214, in get_thumbnail
thumb = create_text_thumb(self, size, 1)
File "/home/mtou/zeitgeist/gnome-activity-journal/src/gio_file.py", line 127, in create_text_thumb
content = highlight(content, lexer, formatter)
File "/usr/lib/pymodules/python2.6/pygments/__init__.py", line 89, in highlight
return format(lex(code, lexer), formatter, outfile)
File "/usr/lib/pymodules/python2.6/pygments/__init__.py", line 46, in lex
return lexer.get_tokens(code)
File "/usr/lib/pymodules/python2.6/pygments/lexer.py", line 151, in get_tokens
text = text.decode(self.encoding)
File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 5564-5566: invalid data
**********
Quick fix :
tell the lexer to try to guess text file encoding by itself (lexer.encoding = "guess" on gio_file.py, line 120).
A better one would be to use :
lexer.encoding = "chardet"
but one should check the python chardet library is available on the system (but I don't know how to do that)
**********
To reproduce behaviour, try moving mouse over an ISO-8859-1 encoded text file containing non alphabetic characters, such as the french accentuated letters (éàï or ù)
Tested on zeitgeist & GAJ installed from the PPA on ubuntu jaunty. (same
behavior expected on trunk)
** Affects: gnome-activity-journal
Importance: Undecided
Status: New
** Tags: encoding file iso-8859-1 text thumbnail utf-8
--
thumbnail creation doesn't work for non-utf8 text files
https://bugs.launchpad.net/bugs/525790
You received this bug notification because you are a member of GNOME
Zeitgeist Team, which is the registrant for GNOME Activity Journal.
Status in GNOME Activity Journal: New
Bug description:
When moving mouse over a text filename (actually a csv) encoded in ISO-8859-1, thumbnail window preview crashes as src/gio_file.py::create_text_thumb() requests a pygments.lexer which only expects UTF-8 encoded content by default.
*********************
Traceback (most recent call last):
File "/home/mtou/zeitgeist/gnome-activity-journal/src/widgets.py", line 562, in _handle_tooltip
return tooltip_window.preview(self.gio_file)
File "/home/mtou/zeitgeist/gnome-activity-journal/src/widgets.py", line 351, in preview
pixbuf = gio_file.get_thumbnail(size=size, border=1)
File "/home/mtou/zeitgeist/gnome-activity-journal/src/gio_file.py", line 214, in get_thumbnail
thumb = create_text_thumb(self, size, 1)
File "/home/mtou/zeitgeist/gnome-activity-journal/src/gio_file.py", line 127, in create_text_thumb
content = highlight(content, lexer, formatter)
File "/usr/lib/pymodules/python2.6/pygments/__init__.py", line 89, in highlight
return format(lex(code, lexer), formatter, outfile)
File "/usr/lib/pymodules/python2.6/pygments/__init__.py", line 46, in lex
return lexer.get_tokens(code)
File "/usr/lib/pymodules/python2.6/pygments/lexer.py", line 151, in get_tokens
text = text.decode(self.encoding)
File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 5564-5566: invalid data
**********
Quick fix :
tell the lexer to try to guess text file encoding by itself (lexer.encoding = "guess" on gio_file.py, line 120).
A better one would be to use :
lexer.encoding = "chardet"
but one should check the python chardet library is available on the system (but I don't know how to do that)
**********
To reproduce behaviour, try moving mouse over an ISO-8859-1 encoded text file containing non alphabetic characters, such as the french accentuated letters (éàï or ù)
Tested on zeitgeist & GAJ installed from the PPA on ubuntu jaunty. (same behavior expected on trunk)
Follow ups
References