openstack team mailing list archive
Mailing list archive
Re: Nova string encoding
On 2/13/12 7:00 PM, Joshua Harlow wrote:
Isn't the command line interface just a setting on the "terminal" app
you are using?
I'm sorry if I wasn't clear before. What's happening is: I am using a
utf8 shell (which is, I believe, normal.) Nova-manage is receiving an
argument and storing it as an 8-bit 'string'. That is already wrong,
because we've now lost track of what kind of 8-bit string it is. Some
parts of the code probably interpret it as UTF8, but the code in the bug
I'm encountering is interpreting it as ASCII. The 'string' type in
Python 2 is known to be ambiguous in this way. Because UTF8 and ASCII
overlap for certain values, this ambiguity is seldom encountered by
One solution to this is to just declare "All strings in Nova will
henceforth be treated as UTF8." That may be the current intent, but it
is not actually the case. It's also not a great policy because it would
have to be enforced 'by hand' due to Python 2's ongoing ignorance about
A more correct design which allows for future flexibility would look
1) Adopt a standard for what encoding is used for all
implicitly-encoded IO. (I would propose that that standard be UTF8
rather than ASCII.)
2) At all points where strings enter Python (e.g. commandline args)
immediately decode them into unicode (which can unambiguously contain
all possible 8-bit encodings.)
3) At all points where 'unicodes' exit Python (being written to stdout
or a log file or a database) explicitly encode them as appropriate
(generally UTF8, again, especially if we're ever going to read them back
That approach is the one I'm most familiar with, and the one advocated
for here: http://farmdev.com/talks/unicode/.
So...back to my original question about what the policy is: Can I
assume that the answer is "There is no policy regarding string encoding
but we've been lucky so far"?
At least on a mac there is a terminal->preferences->advanced which
specifies which encoding to use (mine is UTF-8).
Was that tried/verified?
On 2/13/12 3:52 PM, "Andrew Bogott" <abogott@xxxxxxxxxxxxx> wrote:
On 2/13/12 5:04 PM, Naveed Massjouni wrote:
> Very recently, a change got in that converts all tables (except 1) to
> utf8 encoding, for the mysql engine. I manually tested creating
> servers with unicode names and with unicode metadata, and it worked
> fine. Make sure you are running against the latest code. -Naveed
That's a step in the right direction, but doesn't completely address
what I'm asking, unless by 'all tables' you meant 'all tables and also
all internal variables and also all REST and Commandline interfaces.'
Fixing my particular issue is straightforward, but the fact that I'm
seeing the bug in the first place suggests that there's no standard
encoding currently enforced. Which seems bad.
Mailing list: https://launchpad.net/~openstack
Post to : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp