gwibber-bugs team mailing list archive

Thread
Date
[Bug 646232] Re: utf-8 decode failure

To: gwibber-bugs@xxxxxxxxxxxxxxxxxxx
From: Bilal Shahid <s9iper1@xxxxxxxxx>
Date: Mon, 09 Apr 2012 19:03:30 -0000
Reply-to: Bug 646232 <646232@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Changed in: gwibber
       Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of Gwibber
Bug Heros, which is subscribed to Gwibber.
https://bugs.launchpad.net/bugs/646232

Title:
  utf-8 decode failure

Status in Gwibber:
  Invalid

Bug description:
  gwibber was failing to finish handling the messages in my twitter
  feed, specifically, it was getting caught up on this tweet:

  {'account': u'c3f7c578b77711df98b500226812bae7', 'sender': {'nick':
  'packrattracker', 'followers': None, 'name': 'PackratMarketTracker',
  'url': 'https://twitter.com/packrattracker', 'image':
  'http://a1.twimg.com/profile_images/862134109/packrat_tools_rat_normal.jpg',
  'id': 26991270, 'is_me': False, 'location': 'Packrat'}, 'service':
  'twitter', 'url':
  'https://twitter.com/packrattracker/statuses/25285841972', 'text':
  'New market sighting: Apr\xe8s-ski (Freeriders, 750 points) for 25Tx
  from Beijing #packrat', 'transient': False, 'mid': '25285841972',
  'id': 'ed2a8548c73f11dfb3e400226812bae7', 'content': u'New market
  sighting: Apr&egrave;s-ski (Freeriders, 750 points) for 25Tx from
  Beijing #<a class="hash"
  href="gwibber:/tag?acct=c3f7c578b77711df98b500226812bae7&query=packrat">packrat</a>',
  'source': '<a href="http://packrattools.com"; rel="nofollow">Packrat
  Tools</a>', 'html': 'New market sighting: Apr\xe8s-ski (Freeriders,
  750 points) for 25Tx from Beijing #<a class="hash" href="<a
  href="https://twitter.com#search?q=packrat";>https://twitter.com#search?q=packrat</a>">packrat</a>',
  'rtl': False, 'time': 1285239715.0, 'stream': 'messages', 'operation':
  'receive', 'to_me': False}

  This was causing:

  Traceback (most recent call last):
    File "/usr/lib/python2.7/site-packages/gwibber/microblog/dispatcher.py", line 110, in perform_operation
      json.dumps(m)
    File "/usr/lib64/python2.7/json/__init__.py", line 231, in dumps
      return _default_encoder.encode(obj)
    File "/usr/lib64/python2.7/json/encoder.py", line 201, in encode
      chunks = self.iterencode(o, _one_shot=True)
    File "/usr/lib64/python2.7/json/encoder.py", line 264, in iterencode
      return _iterencode(o, 0)
  UnicodeDecodeError: 'utf8' codec can't decode byte 0xe8 in position 24: invalid continuation byte

  Now, I happen to know that the \xe8 character is actually "&egrave;",
  so, I made this change in microblog/dispatcher.py:

  -          m["rtl"] = util.isRTL(re.sub(text_cleaner, "", m["text"].decode('utf-8')))
  +          m["text"] = m["text"].decode('utf-8', 'ignore')
  +          m["rtl"] = util.isRTL(re.sub(text_cleaner, "", m["text"]))

  I'm sure there is a "better" way to not hit the wall when m["text"]
  can't be decoded into utf-8 cleanly, notably kitchen.text
  (http://packages.python.org/kitchen/api-text.html), but this is a
  workaround that resolves the issue without adding new dependencies.

  Note: I tried just doing the ignore as part of the decode call in the
  m["rtl"] evaluation, but then it failed later when the message went
  into json.dumps(m).

To manage notifications about this bug go to:
https://bugs.launchpad.net/gwibber/+bug/646232/+subscriptions