launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #08222
Re: Unicode and Launchpad
On Oct 27, 2011, at 10:32 AM, Julian Edwards wrote:
> * Keep all strings as unicode internally (with the exception of plain ASCII
>strings which are easily coerced to unicode automatically)
> * Convert to/from unicode only when necessary (e.g. utf8 byte string or
>MIME) at the point the string *exits or enters* Launchpad.
> * Never use str()
> * Whenever someone is dealing with strings in a branch, please review
>accordingly.
Two things that I've adopted in my own code that have helped immensely, and
that will really help you when Launchpad is ported to Python 3 <wink> are the
following. This will work in Python 2.6 and 2.7.
* Put this at the top of every module:
from __future__ import absolute_import, unicode_literals
Okay, the absolute_import future isn't relevant to this discussion, but it'll
still prepare you well for the future. The unicode_literals is crucial though
because it means all literals are by default unicodes. No more u'' prefixes
required and no more accidental 8-bit human readable text. Since literals are
almost always text rather than byte-strings, this is a win (once you've dealt
with the fallout :).
* b'' for byte strings.
For those occasions where you do still need to have literal byte strings,
e.g. for speaking a wire protocol or what not, use the b'' prefix.
These changes will make 2to3 work better, and will ensure a cleaner code base.
Cheers,
-Barry
References