← Back to team overview

maas-devel team mailing list archive

open() text and binary modes in Python

 

You might have noticed that I always use the "b" flag when doing file
IO. Contrary to popular opinion, this is not because I have a secret
kinky passion for Microsft, Windows, and Steve Ballmer's burnished
pate. Instead, it's because I've long thought it best to treat all
file IO as binary and be explicit about it, and, more recently, for
Python 3 compatibility.

Python 2 does implicit encoding and decoding between unicode and byte
strings. This is convenient, but means we can get away with being
vague when reading and writing.

Python 3, however, will not let us be vague:

  $ python3
  >>> open("foo", "w").write(b'bar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: must be str, not bytes
  >>> open("foo", "wb").write('bar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: 'str' does not support the buffer interface

Python 3's open() takes a new encoding parameter - much like
codecs.open() in Python 2 - with the following behaviour:

   "In text mode, if encoding is not specified the encoding used is
    platform dependent."

Equals "don't count on it". Getting into the habit of using binary
mode now - or codecs.open() - ought to help us avoid this pitfall when
porting to Python 3.

Python 3's open() function also enables universal newlines in text
mode by default, for *writing* too; Python 2 does not support
universal newlines when writing. Using binary mode ought to help us
avoid this pitfall too.

My suggestion, while we're still on Python 2, is to always use the "b"
flag when doing file IO, and to always explicity encode and decode
when writing and reading, *or* to use codecs.open().

I haven't been following this advice fully in much of the code I've
written, but I intend to do so from here on.

I'm interested in feedback and/or better suggestions.