dulwich-users team mailing list archive
-
dulwich-users team
-
Mailing list archive
-
Message #00814
Re: Python 3 porting
Hi Jelmer.
Thank you for your reply.
> Gary van der Merwe wrote:
>> * where necessary encodes strings to bytes using the ascii codec, e.g.
>> `str(hexsha).encode('ascii')`
Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> I think we should try to have these as bytes in the first place where
> possible, rather than converting them. Converting might still be a
> good intermediate step, though.
Sorry, I did not explain this clearly.
In a few places, I'm encoding strings returned from functions. e.g.:
# sha.hexdigest() returns a unicode string in python 3
self.sha().hexdigest().encode('ascii')
str(length).encode('ascii')
Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> I'm not a fan of six, and don't want to use it in the main dulwich
> branch. The whole point of Python 3 is to make some
> backwards-incompatible changes to clean up the language and code
> written in it. Using six accomplishes the opposite of that, and adds
> a mandatory dependency - where dulwich currently just depends on python
> itself.
How do you feel about including six in dulwich as dulwich.contrib.six? it will
include 1 python file, and the LICENSE file
Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> 2to3 doesn't sound too bad, though I don't have any experience with it
> myself.
I've look at it, and it does seem feasible. For this branch, I've got the test
suite running:
https://github.com/garyvdm/dulwich/tree/python3-2to3
$ python3 setup.py test
....
Ran 862 tests in 4.278s
FAILED (failures=12, errors=553, skipped=16)
This is not to far off my python-six branch
$ python3 setup.py test
....
Ran 862 tests in 2.048s
FAILED (failures=10, errors=453, skipped=22)
> Gary van der Merwe wrote:
>> I would like to get feedback on this approach I'm taking. In
>> particular, I would like to get a review of the string changes that I
>> have done so far [2]. Are we happy with removing the string formating
>> for bytes appending, e.g. changing
>> "%s %d\0" % (type_name, length)
>> to
>> type_name + b' ' + str(length).encode('ascii') + b'\0'
Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> That's not great, but as long as this is limited to a couple of places
> it should be fine.
Unfortunately this is in a LOT of places. In objects.py, I even ended up
writing a small helper function for the serialize methods:
https://github.com/garyvdm/dulwich/blob/python3/dulwich/objects.py#L175
Usage examples:
https://github.com/garyvdm/dulwich/blob/python3/dulwich/objects.py#L667
The other options are:
* Use string formatting, and encode (which also means we have to decode
when ever we need to include bytes.)
* Use PEP 461. This unfortunately means waiting for, and only supporting
python 3.5. It seems like it is not possible to monkeypatch this in.
Just a side note: I am doing a lot of history rewriting in my branches (commit
squashing, splitting, rebasing) and doing push -f. If anyone wants to base their
own work off my branches, just let me know, and I will stop doing this.
Regards,
Gary
Follow ups
References