← Back to team overview

mosquitto-users team mailing list archive

Payload support Unicode/UTF-8? (mosquitto Python client)

 

hi,

I am using mosquito.py as the server side client to build a messaging service. I am using Python 2.7.3. Sorry I am quite new to Python, and this is the most difficult issue I've ever met with it in past few months. I hope I can get some help from Python masters here. :)

When I was trying to use payload to pass utf-8 text message. I found that it works perfectly with English and ASCII, but if i add Chinese to the payload text, there are a lot of error like this:

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: unexpected end of 
data

1. I already saved my python source as 'utf-8'

2. I already set the sys.defaultencoding as 'utf-8' by adding following code to my source code:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

I added following test code to my client code, it works perfectly:

	#testing decoding
	c = '中国人'    #some Chinese text here.
        print "Chinese = ", c, "repr = ", repr(c), "type = ", type(c), len(c)
        d = c.decode('utf8')
        print "Decoded = ", d, "repr = ", repr(d), "type = ", type(d), len(d)

FYI, the print output is:

        Chinese =  中国人 repr =  '\xe4\xb8\xad\xe5\x9b\xbd\xe4\xba\xba' type =  <type 'str'> 9
        Decoded =  中国人 repr =  u'\u4e2d\u56fd\u4eba' type =  <type 'unicode'> 3

which means the decoding works fine here.

I added following code for payload decode:

	print "Payload = ", msg.payload, "repr = ", repr(msg.payload), "type = ", type(msg.payload), len(msg.payload)
        text = msg.payload.decode('utf8')

When the payload is pure English or number, everything is perfect, print output can be like this:

Payload =  hi repr =  'hi' type =  <type 'str'> 2
Text =  hi repr =  u'hi' type =  <type 'unicode'> 2


if I use '中国人‘ as payload text, the output look like this:

Payload =  中 repr =  '\xe4\xb8\xad' type =  <type 'str'> 3
Text =  中 repr =  u'\u4e2d' type =  <type 'unicode'> 1

only one Chinese character 中 show up, the left two chars are cut off. why is that?

but if I try another 2 different char '你好'  in the payload, it didn't went through at all, the error message looks like this. Payload '你好' became  question mark here?  So the output is different based on what Chinese char i choose. 

Payload =  ? repr =  '\xe4\xb8' type =  <type 'str'> 2
Traceback (most recent call last):
  File "messenger.py", line 181, in <module>
    main_loop()
  File "messenger.py", line 173, in main_loop
    while mqttc.loop() == 0:
  File "/usr/local/lib/python2.7/dist-packages/mosquitto.py", line 670, in loop
    rc = self.loop_read(max_packets)
  File "/usr/local/lib/python2.7/dist-packages/mosquitto.py", line 840, in loop_read
    rc = self._packet_read()
  File "/usr/local/lib/python2.7/dist-packages/mosquitto.py", line 1151, in _packet_read
    rc = self._packet_handle()
  File "/usr/local/lib/python2.7/dist-packages/mosquitto.py", line 1531, in _packet_handle
    return self._handle_pubrel()
  File "/usr/local/lib/python2.7/dist-packages/mosquitto.py", line 1682, in _handle_pubrel
    self.on_message(self, self._userdata, self._messages[i])
  File "messenger.py", line 129, in on_message
    text = msg.payload.decode('utf8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: unexpected end of data

I already spent two days trying to fix this, and digging to all kinds of solutions. Really hope can get some help on this. Many thanks!

-Horace



Follow ups