← Back to team overview

dulwich-users team mailing list archive

Python 3 porting

 

Hi all

I'm having a go at porting dulwich to Python 3. My goal is to try
achieve a common code base that works for both python 2.7 and python
3. I think that a common code base is important, as python 3 ports
done as separate code bases tend to bitrot, and not get maintained.

First of all, there were at lot of easy changes to make (except
syntax, print statement to function etc.) Most of these can be made
automatically using tools like python-modernize[1], and I've already
got most of these changes merged it to master.

Next is the strings issue. I suspect this will involve the most amount
of effort. I've started on this, and completed dulwich/objects.py and
dulwich/tests/test_objects.py [2]

This change:
* marks all string literals that should be bytes and not unicode in
python3 as bytes.
* removes all `%` string formatting for bytes. (This is not supported
for 3.0-3.4, but will be supported in 3.5 [3])
* where necessary encodes strings to bytes using the ascii codec, e.g.
`str(hexsha).encode('ascii')`

When used together with my python3-six branch, this passes the full
test suite in python 2.7, and `dulwich.tests.test_objects` also passes
in python 3.4 \o/

The next set of changes I'm going to group together because they are
more difficult to solve with a common code base with out extra
dependencies. These include:
* the changes to the dict functions (items, iteritems, values,
itervalues, keys, iterkeys)
* moves in the standard library.
* Some differences when dealing with bytes instead of strings, e.g.
getting the ordinal value of a chr from a bytes/str.

These changes can be dealt with in two ways

The first option is using the six package[4]. At this point in time, I
feel this is the best option. It is certainly the easiest for me to
work with while I get the bytes/string literals marked correctly. The
disadvantage is that it adds a dependency to dulwich.

The other option is to use 2to3, run at package time. There is good
support in distribute for doing this.[5] I've yet to look into this in
detail.

As mentioned, I'm using six while I work on the other issues. I am
however carefully separating any changes that depend on six into a
separate branch [6] so that if we chose not to use six, I still have
mergeable work.

The last changes are to the c extensions. I have no clue on how to do
this in a single code base. I'll cross that bridge when I get there.
If you know how to do this, and would like to help out, this would be
greatly appreciated.

I would like to get feedback on this approach I'm taking. In
particular, I would like to get a review of the string changes that I
have done so far [2]. Are we happy with removing the string formating
for bytes appending, e.g. changing
    "%s %d\0" % (type_name, length)
to
    type_name + b' ' + str(length).encode('ascii') + b'\0'


Regards,

Gary



1. https://pypi.python.org/pypi/modernize/
2. https://github.com/garyvdm/dulwich/commit/0914186
3. http://legacy.python.org/dev/peps/pep-0461/
4. https://pythonhosted.org/six/
5. http://python3porting.com/2to3.html#usingdistribute
6. https://github.com/garyvdm/dulwich/tree/python3-six


Follow ups