yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1152188] Re: unicode is used wrongly throughout nova

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Matthew Sherborne <1152188@xxxxxxxxxxxxxxxxxx>
Date: Thu, 14 Mar 2013 18:39:43 -0000
Reply-to: Bug 1152188 <1152188@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Thanks Johannes. That was my original proposed solution.

I've done some experimenting and discovered this is a non-bug. I'll
close this bug and report a new one.

The reason it's not a bug is that in nova/__init__.py - where gettext is
installed, they pass 1 to the unicode flag, which means all encodings
will come in already in unicode.

and of course all output that's in unicode will automatically be printed
in locale.getpreferredencoding().

So the new proposed solution is to use make sure that we use
unicode(exc) everywhere, and never use str(exc)

** Changed in: nova
       Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1152188

Title:
  unicode is used wrongly throughout nova

Status in OpenStack Compute (Nova):
  Invalid

Bug description:
  We use 'unicode' to turn a lot of exceptions into unicode strings
  throughout nova.

  For example:
  https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/servers.py#L882

  As the exception's message may be generated by the gettext translation
  system, it could hold unicode encoded data.

  The way we try to turn that into a unicode string, assumes that the
  input is 7-bit ascii.

  The output encoding of each language may be a different encoding.

  If the translated string is not 7 bit ascii. Our code will raise a:

  "UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position
  0: ordinal not in range(128)"

  When the gettext system is installed, the '_' function is
  'gettext.gettext' for whatever locale is installed.

  ----

  This is the suggested fix:

  We should remove all uses of 'unicode' and replace it with 'str'.

  That way it doesn't matter what encoding the .po file was in, it'll
  just leave it in that encoding and output it in that encoding.

  So if, for example the chinese translator decides to use the GB18030
  encoding because it's more popular in their region/culture than utf-8
  .. by using only 'str' the text will be output in the language and
  encoding that the translator intended, so it'll work on all machines
  that are set up with GB18030 support.

  ----

  Demonstration:

  If you run this from the nova source dir:

  mkdir -p tmp/ja/LC_MESSAGES
  msgfmt nova/locale/ja/LC_MESSAGES/nova.po -o tmp/ja/LC_MESSAGES/nova.mo
  python
  >>> ja = gettext.translation('nova', 'tmp', ['ja'])
  >>> s = ''FAKE ISCSI: %s''
  >>> t = ja.gettext(s)  # This simulates calling _('FAKE ISCSI: %s'); in nova code
  >>> print t
  偽のISCSI: %s
  >>> t
  '\xe5\x81\xbd\xe3\x81\xaeISCSI: %s'
  >>> unicode(t)
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
  >>> t.decode('utf-8')
  u'\u507d\u306eISCSI: %s'
  >>> unicode(t, 'utf-8')
  u'\u507d\u306eISCSI: %s'
  >>> print t.decode('utf-8')
  偽のISCSI: %s

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1152188/+subscriptions