gnome-zeitgeist team mailing list archive

Thread
Date

[Bug 525790] Re: thumbnail creation doesn't work for non-utf8 text files

To: gnome-zeitgeist@xxxxxxxxxxxxxxxxxxx
From: tehk <email.tehk@xxxxxxxxx>
Date: Mon, 22 Feb 2010 16:53:33 -0000
Reply-to: Bug 525790 <525790@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Committed encoding using chardet when the module exist for thumb
generation of non-utf8 files. There was a issue with "encoding =
'guess'". So there is no fallback at the moment if chardet is not on the
users system.

** Changed in: gnome-activity-journal
       Status: New => Fix Committed

** Changed in: gnome-activity-journal
   Importance: Undecided => Medium

-- 
thumbnail creation doesn't work for non-utf8 text files
https://bugs.launchpad.net/bugs/525790
You received this bug notification because you are a member of GNOME
Zeitgeist Team, which is the registrant for GNOME Activity Journal.

Status in GNOME Activity Journal: Fix Committed

Bug description:
When moving mouse over a text filename (actually a csv) encoded in ISO-8859-1, thumbnail window preview crashes as src/gio_file.py::create_text_thumb() requests a pygments.lexer which only expects UTF-8 encoded content by default.

*********************
Traceback (most recent call last):
  File "/home/mtou/zeitgeist/gnome-activity-journal/src/widgets.py", line 562, in _handle_tooltip
    return tooltip_window.preview(self.gio_file)
  File "/home/mtou/zeitgeist/gnome-activity-journal/src/widgets.py", line 351, in preview
    pixbuf = gio_file.get_thumbnail(size=size, border=1)
  File "/home/mtou/zeitgeist/gnome-activity-journal/src/gio_file.py", line 214, in get_thumbnail
    thumb = create_text_thumb(self, size, 1)
  File "/home/mtou/zeitgeist/gnome-activity-journal/src/gio_file.py", line 127, in create_text_thumb
    content = highlight(content, lexer, formatter)
  File "/usr/lib/pymodules/python2.6/pygments/__init__.py", line 89, in highlight
    return format(lex(code, lexer), formatter, outfile)
  File "/usr/lib/pymodules/python2.6/pygments/__init__.py", line 46, in lex
    return lexer.get_tokens(code)
  File "/usr/lib/pymodules/python2.6/pygments/lexer.py", line 151, in get_tokens
    text = text.decode(self.encoding)
  File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 5564-5566: invalid data

**********
Quick fix : 
tell the lexer to try to guess text file encoding by itself (lexer.encoding = "guess" on gio_file.py, line 120).

A better one would be to use :
lexer.encoding = "chardet"
but one should check the python chardet library is available on the system (but I don't know how to do that)

**********
To reproduce behaviour, try moving mouse over an ISO-8859-1 encoded text file containing non alphabetic characters, such as the french accentuated letters (éàï or ù)

Tested on zeitgeist & GAJ installed from the PPA on ubuntu jaunty. (same behavior expected on trunk)

References

[Bug 525790] [NEW] thumbnail creation doesn't work for non-utf8 text files
From: mtou, 2010-02-22