gnome-zeitgeist team mailing list archive
-
gnome-zeitgeist team
-
Mailing list archive
-
Message #00433
[Bug 525790] Re: thumbnail creation doesn't work for non-utf8 text files
Committed encoding using chardet when the module exist for thumb
generation of non-utf8 files. There was a issue with "encoding =
'guess'". So there is no fallback at the moment if chardet is not on the
users system.
** Changed in: gnome-activity-journal
Status: New => Fix Committed
** Changed in: gnome-activity-journal
Importance: Undecided => Medium
--
thumbnail creation doesn't work for non-utf8 text files
https://bugs.launchpad.net/bugs/525790
You received this bug notification because you are a member of GNOME
Zeitgeist Team, which is the registrant for GNOME Activity Journal.
Status in GNOME Activity Journal: Fix Committed
Bug description:
When moving mouse over a text filename (actually a csv) encoded in ISO-8859-1, thumbnail window preview crashes as src/gio_file.py::create_text_thumb() requests a pygments.lexer which only expects UTF-8 encoded content by default.
*********************
Traceback (most recent call last):
File "/home/mtou/zeitgeist/gnome-activity-journal/src/widgets.py", line 562, in _handle_tooltip
return tooltip_window.preview(self.gio_file)
File "/home/mtou/zeitgeist/gnome-activity-journal/src/widgets.py", line 351, in preview
pixbuf = gio_file.get_thumbnail(size=size, border=1)
File "/home/mtou/zeitgeist/gnome-activity-journal/src/gio_file.py", line 214, in get_thumbnail
thumb = create_text_thumb(self, size, 1)
File "/home/mtou/zeitgeist/gnome-activity-journal/src/gio_file.py", line 127, in create_text_thumb
content = highlight(content, lexer, formatter)
File "/usr/lib/pymodules/python2.6/pygments/__init__.py", line 89, in highlight
return format(lex(code, lexer), formatter, outfile)
File "/usr/lib/pymodules/python2.6/pygments/__init__.py", line 46, in lex
return lexer.get_tokens(code)
File "/usr/lib/pymodules/python2.6/pygments/lexer.py", line 151, in get_tokens
text = text.decode(self.encoding)
File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 5564-5566: invalid data
**********
Quick fix :
tell the lexer to try to guess text file encoding by itself (lexer.encoding = "guess" on gio_file.py, line 120).
A better one would be to use :
lexer.encoding = "chardet"
but one should check the python chardet library is available on the system (but I don't know how to do that)
**********
To reproduce behaviour, try moving mouse over an ISO-8859-1 encoded text file containing non alphabetic characters, such as the french accentuated letters (éàï or ù)
Tested on zeitgeist & GAJ installed from the PPA on ubuntu jaunty. (same behavior expected on trunk)
References