← Back to team overview

openstack team mailing list archive

Re: Nova string encoding


On 2/13/12 7:00 PM, Joshua Harlow wrote:
Isn't the command line interface just a setting on the "terminal" app you are using?
I'm sorry if I wasn't clear before. What's happening is: I am using a utf8 shell (which is, I believe, normal.) Nova-manage is receiving an argument and storing it as an 8-bit 'string'. That is already wrong, because we've now lost track of what kind of 8-bit string it is. Some parts of the code probably interpret it as UTF8, but the code in the bug I'm encountering is interpreting it as ASCII. The 'string' type in Python 2 is known to be ambiguous in this way. Because UTF8 and ASCII overlap for certain values, this ambiguity is seldom encountered by Americans.

One solution to this is to just declare "All strings in Nova will henceforth be treated as UTF8." That may be the current intent, but it is not actually the case. It's also not a great policy because it would have to be enforced 'by hand' due to Python 2's ongoing ignorance about encodings.

A more correct design which allows for future flexibility would look like this:

1) Adopt a standard for what encoding is used for all implicitly-encoded IO. (I would propose that that standard be UTF8 rather than ASCII.)

2) At all points where strings enter Python (e.g. commandline args) immediately decode them into unicode (which can unambiguously contain all possible 8-bit encodings.)

3) At all points where 'unicodes' exit Python (being written to stdout or a log file or a database) explicitly encode them as appropriate (generally UTF8, again, especially if we're ever going to read them back in.)

That approach is the one I'm most familiar with, and the one advocated for here: http://farmdev.com/talks/unicode/.

So...back to my original question about what the policy is: Can I assume that the answer is "There is no policy regarding string encoding but we've been lucky so far"?


At least on a mac there is a terminal->preferences->advanced which specifies which encoding to use (mine is UTF-8).

Was that tried/verified?

On 2/13/12 3:52 PM, "Andrew Bogott" <abogott@xxxxxxxxxxxxx> wrote:

    On 2/13/12 5:04 PM, Naveed Massjouni wrote:
    > Very recently, a change got in that converts all tables (except 1) to
    > utf8 encoding, for the mysql engine. I manually tested creating
    > servers with unicode names and with unicode metadata, and it worked
    > fine. Make sure you are running against the latest code. -Naveed

    That's a step in the right direction, but doesn't completely address
    what I'm asking, unless by 'all tables' you meant 'all tables and also
    all internal variables and also all REST and Commandline interfaces.'
    Fixing my particular issue is straightforward, but the fact that I'm
    seeing the bug in the first place suggests that there's no standard
    encoding currently enforced.  Which seems bad.

    Mailing list: https://launchpad.net/~openstack
    Post to     : openstack@xxxxxxxxxxxxxxxxxxx
    Unsubscribe : https://launchpad.net/~openstack
    More help   : https://help.launchpad.net/ListHelp

Follow ups