← Back to team overview

dulwich-users team mailing list archive

Re: Python 3 porting

 

Hi Jelmer.

Thank you for your reply.

> Gary van der Merwe wrote:
>> * where necessary encodes strings to bytes using the ascii codec, e.g.
>> `str(hexsha).encode('ascii')`

Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> I think we should try to have these as bytes in the first place where
> possible, rather than converting them. Converting might still be a
> good intermediate step, though.

Sorry, I did not explain this clearly.

In a few places, I'm encoding strings returned from functions. e.g.:

  # sha.hexdigest() returns a unicode string in python 3
  self.sha().hexdigest().encode('ascii')

  str(length).encode('ascii')


Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> I'm not a fan of six, and don't want to use it in the main dulwich
> branch. The whole point of Python 3 is to make some
> backwards-incompatible changes to clean up the language and code
> written in it. Using six accomplishes the opposite of that, and adds
> a mandatory dependency - where dulwich currently just depends on python
> itself.

How do you feel about including six in dulwich as dulwich.contrib.six? it will
include 1 python file, and the LICENSE file

Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> 2to3 doesn't sound too bad, though I don't have any experience with it
> myself.

I've look at it, and it does seem feasible. For this branch, I've got the test
suite running:

https://github.com/garyvdm/dulwich/tree/python3-2to3

$ python3 setup.py test
....
Ran 862 tests in 4.278s

FAILED (failures=12, errors=553, skipped=16)

This is not to far off my python-six branch

$ python3 setup.py test
....
Ran 862 tests in 2.048s

FAILED (failures=10, errors=453, skipped=22)


> Gary van der Merwe wrote:
>> I would like to get feedback on this approach I'm taking. In
>> particular, I would like to get a review of the string changes that I
>> have done so far [2]. Are we happy with removing the string formating
>> for bytes appending, e.g. changing
>>     "%s %d\0" % (type_name, length)
>> to
>>     type_name + b' ' + str(length).encode('ascii') + b'\0'


Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> That's not great, but as long as this is limited to a couple of places
> it should be fine.

Unfortunately this is in a LOT of places. In objects.py, I even ended up
writing a small helper function for the serialize methods:

https://github.com/garyvdm/dulwich/blob/python3/dulwich/objects.py#L175

Usage examples:

https://github.com/garyvdm/dulwich/blob/python3/dulwich/objects.py#L667

The other options are:

 * Use string formatting, and encode (which also means we have to decode
   when ever we need to include bytes.)

 * Use PEP 461. This unfortunately means waiting for, and only supporting
   python 3.5. It seems like it is not possible to monkeypatch this in.



Just a side note: I am doing a lot of history rewriting in my branches (commit
squashing, splitting, rebasing) and doing push -f. If anyone wants to base their
own work off my branches, just let me know, and I will stop doing this.

Regards,

Gary


Follow ups

References