yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #01303
[Bug 1152188] Re: unicode is used wrongly throughout nova
Thanks Johannes. That was my original proposed solution.
I've done some experimenting and discovered this is a non-bug. I'll
close this bug and report a new one.
The reason it's not a bug is that in nova/__init__.py - where gettext is
installed, they pass 1 to the unicode flag, which means all encodings
will come in already in unicode.
and of course all output that's in unicode will automatically be printed
in locale.getpreferredencoding().
So the new proposed solution is to use make sure that we use
unicode(exc) everywhere, and never use str(exc)
** Changed in: nova
Status: Confirmed => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1152188
Title:
unicode is used wrongly throughout nova
Status in OpenStack Compute (Nova):
Invalid
Bug description:
We use 'unicode' to turn a lot of exceptions into unicode strings
throughout nova.
For example:
https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/servers.py#L882
As the exception's message may be generated by the gettext translation
system, it could hold unicode encoded data.
The way we try to turn that into a unicode string, assumes that the
input is 7-bit ascii.
The output encoding of each language may be a different encoding.
If the translated string is not 7 bit ascii. Our code will raise a:
"UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position
0: ordinal not in range(128)"
When the gettext system is installed, the '_' function is
'gettext.gettext' for whatever locale is installed.
----
This is the suggested fix:
We should remove all uses of 'unicode' and replace it with 'str'.
That way it doesn't matter what encoding the .po file was in, it'll
just leave it in that encoding and output it in that encoding.
So if, for example the chinese translator decides to use the GB18030
encoding because it's more popular in their region/culture than utf-8
.. by using only 'str' the text will be output in the language and
encoding that the translator intended, so it'll work on all machines
that are set up with GB18030 support.
----
Demonstration:
If you run this from the nova source dir:
mkdir -p tmp/ja/LC_MESSAGES
msgfmt nova/locale/ja/LC_MESSAGES/nova.po -o tmp/ja/LC_MESSAGES/nova.mo
python
>>> ja = gettext.translation('nova', 'tmp', ['ja'])
>>> s = ''FAKE ISCSI: %s''
>>> t = ja.gettext(s) # This simulates calling _('FAKE ISCSI: %s'); in nova code
>>> print t
偽のISCSI: %s
>>> t
'\xe5\x81\xbd\xe3\x81\xaeISCSI: %s'
>>> unicode(t)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
>>> t.decode('utf-8')
u'\u507d\u306eISCSI: %s'
>>> unicode(t, 'utf-8')
u'\u507d\u306eISCSI: %s'
>>> print t.decode('utf-8')
偽のISCSI: %s
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1152188/+subscriptions