gwibber-bugs team mailing list archive
-
gwibber-bugs team
-
Mailing list archive
-
Message #02802
[Bug 646232] Re: utf-8 decode failure
** Changed in: gwibber
Status: Incomplete => Invalid
--
You received this bug notification because you are a member of Gwibber
Bug Heros, which is subscribed to Gwibber.
https://bugs.launchpad.net/bugs/646232
Title:
utf-8 decode failure
Status in Gwibber:
Invalid
Bug description:
gwibber was failing to finish handling the messages in my twitter
feed, specifically, it was getting caught up on this tweet:
{'account': u'c3f7c578b77711df98b500226812bae7', 'sender': {'nick':
'packrattracker', 'followers': None, 'name': 'PackratMarketTracker',
'url': 'https://twitter.com/packrattracker', 'image':
'http://a1.twimg.com/profile_images/862134109/packrat_tools_rat_normal.jpg',
'id': 26991270, 'is_me': False, 'location': 'Packrat'}, 'service':
'twitter', 'url':
'https://twitter.com/packrattracker/statuses/25285841972', 'text':
'New market sighting: Apr\xe8s-ski (Freeriders, 750 points) for 25Tx
from Beijing #packrat', 'transient': False, 'mid': '25285841972',
'id': 'ed2a8548c73f11dfb3e400226812bae7', 'content': u'New market
sighting: Après-ski (Freeriders, 750 points) for 25Tx from
Beijing #<a class="hash"
href="gwibber:/tag?acct=c3f7c578b77711df98b500226812bae7&query=packrat">packrat</a>',
'source': '<a href="http://packrattools.com" rel="nofollow">Packrat
Tools</a>', 'html': 'New market sighting: Apr\xe8s-ski (Freeriders,
750 points) for 25Tx from Beijing #<a class="hash" href="<a
href="https://twitter.com#search?q=packrat">https://twitter.com#search?q=packrat</a>">packrat</a>',
'rtl': False, 'time': 1285239715.0, 'stream': 'messages', 'operation':
'receive', 'to_me': False}
This was causing:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/gwibber/microblog/dispatcher.py", line 110, in perform_operation
json.dumps(m)
File "/usr/lib64/python2.7/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/usr/lib64/python2.7/json/encoder.py", line 201, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib64/python2.7/json/encoder.py", line 264, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe8 in position 24: invalid continuation byte
Now, I happen to know that the \xe8 character is actually "è",
so, I made this change in microblog/dispatcher.py:
- m["rtl"] = util.isRTL(re.sub(text_cleaner, "", m["text"].decode('utf-8')))
+ m["text"] = m["text"].decode('utf-8', 'ignore')
+ m["rtl"] = util.isRTL(re.sub(text_cleaner, "", m["text"]))
I'm sure there is a "better" way to not hit the wall when m["text"]
can't be decoded into utf-8 cleanly, notably kitchen.text
(http://packages.python.org/kitchen/api-text.html), but this is a
workaround that resolves the issue without adding new dependencies.
Note: I tried just doing the ignore as part of the decode call in the
m["rtl"] evaluation, but then it failed later when the message went
into json.dumps(m).
To manage notifications about this bug go to:
https://bugs.launchpad.net/gwibber/+bug/646232/+subscriptions