← Back to team overview

launchpad-dev team mailing list archive

Re: Unicode and Launchpad

 

On Oct 27, 2011, at 10:32 AM, Julian Edwards wrote:

> * Keep all strings as unicode internally (with the exception of plain ASCII
>strings which are easily coerced to unicode automatically)
> * Convert to/from unicode only when necessary (e.g. utf8 byte string or
>MIME) at the point the string *exits or enters* Launchpad.
> * Never use str()
> * Whenever someone is dealing with strings in a branch, please review 
>accordingly.

Two things that I've adopted in my own code that have helped immensely, and
that will really help you when Launchpad is ported to Python 3 <wink> are the
following.  This will work in Python 2.6 and 2.7.

* Put this at the top of every module:

    from __future__ import absolute_import, unicode_literals

Okay, the absolute_import future isn't relevant to this discussion, but it'll
still prepare you well for the future.  The unicode_literals is crucial though
because it means all literals are by default unicodes.  No more u'' prefixes
required and no more accidental 8-bit human readable text.  Since literals are
almost always text rather than byte-strings, this is a win (once you've dealt
with the fallout :).

* b'' for byte strings.

For those occasions where you do still need to have literal byte strings,
e.g. for speaking a wire protocol or what not, use the b'' prefix.

These changes will make 2to3 work better, and will ensure a cleaner code base.

Cheers,
-Barry


References