← Back to team overview

dulwich-users team mailing list archive

Re: Python 3 porting

 

Hi Gary,

On Thu, Apr 03, 2014 at 05:41:06PM +0200, Gary van der Merwe wrote:
> I'm having a go at porting dulwich to Python 3. My goal is to try
> achieve a common code base that works for both python 2.7 and python
> 3. I think that a common code base is important, as python 3 ports
> done as separate code bases tend to bitrot, and not get maintained.
> 
> First of all, there were at lot of easy changes to make (except
> syntax, print statement to function etc.) Most of these can be made
> automatically using tools like python-modernize[1], and I've already
> got most of these changes merged it to master.
> 
> Next is the strings issue. I suspect this will involve the most amount
> of effort. I've started on this, and completed dulwich/objects.py and
> dulwich/tests/test_objects.py [2]
> 
> This change:
> * marks all string literals that should be bytes and not unicode in
> python3 as bytes.
> * removes all `%` string formatting for bytes. (This is not supported
> for 3.0-3.4, but will be supported in 3.5 [3])
> * where necessary encodes strings to bytes using the ascii codec, e.g.
> `str(hexsha).encode('ascii')`
I think we should try to have these as bytes in the first place where
possible, rather than converting them. Converting might still be a
good intermediate step, though.

> When used together with my python3-six branch, this passes the full
> test suite in python 2.7, and `dulwich.tests.test_objects` also passes
> in python 3.4 \o/
\o/

> The next set of changes I'm going to group together because they are
> more difficult to solve with a common code base with out extra
> dependencies. These include:
> * the changes to the dict functions (items, iteritems, values,
> itervalues, keys, iterkeys)
> * moves in the standard library.
> * Some differences when dealing with bytes instead of strings, e.g.
> getting the ordinal value of a chr from a bytes/str.
> 
> These changes can be dealt with in two ways
> 
> The first option is using the six package[4]. At this point in time, I
> feel this is the best option. It is certainly the easiest for me to
> work with while I get the bytes/string literals marked correctly. The
> disadvantage is that it adds a dependency to dulwich.
> 
> The other option is to use 2to3, run at package time. There is good
> support in distribute for doing this.[5] I've yet to look into this in
> detail.
> 
> As mentioned, I'm using six while I work on the other issues. I am
> however carefully separating any changes that depend on six into a
> separate branch [6] so that if we chose not to use six, I still have
> mergeable work.

I'm not a fan of six, and don't want to use it in the main dulwich
branch. The whole point of Python 3 is to make some
backwards-incompatible changes to clean up the language and code
written in it. Using six accomplishes the opposite of that, and adds
a mandatory dependency - where dulwich currently just depends on python
itself.

2to3 doesn't sound too bad, though I don't have any experience with it
myself.

> The last changes are to the c extensions. I have no clue on how to do
> this in a single code base. I'll cross that bridge when I get there.
> If you know how to do this, and would like to help out, this would be
> greatly appreciated.
I suspect this will involve a lot of #ifdefs :(

> I would like to get feedback on this approach I'm taking. In
> particular, I would like to get a review of the string changes that I
> have done so far [2]. Are we happy with removing the string formating
> for bytes appending, e.g. changing
>     "%s %d\0" % (type_name, length)
> to
>     type_name + b' ' + str(length).encode('ascii') + b'\0'
That's not great, but as long as this is limited to a couple of places
it should be fine.

Cheers,

Jelmer

Attachment: signature.asc
Description: Digital signature


Follow ups

References