dulwich-users team mailing list archive
-
dulwich-users team
-
Mailing list archive
-
Message #00815
Re: Python 3 porting
On Wed, Apr 09, 2014 at 03:05:32PM +0200, Gary van der Merwe wrote:
> > Gary van der Merwe wrote:
> >> * where necessary encodes strings to bytes using the ascii codec, e.g.
> >> `str(hexsha).encode('ascii')`
>
> Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> > I think we should try to have these as bytes in the first place where
> > possible, rather than converting them. Converting might still be a
> > good intermediate step, though.
>
> Sorry, I did not explain this clearly.
>
> In a few places, I'm encoding strings returned from functions. e.g.:
>
> # sha.hexdigest() returns a unicode string in python 3
> self.sha().hexdigest().encode('ascii')
>
> str(length).encode('ascii')
Ah, I see. Perhaps we can keep hexdigest() as a unicode string and .digest() as
bytes? That is most consistent with the meaning of unicode and byte strings.
> Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> > I'm not a fan of six, and don't want to use it in the main dulwich
> > branch. The whole point of Python 3 is to make some
> > backwards-incompatible changes to clean up the language and code
> > written in it. Using six accomplishes the opposite of that, and adds
> > a mandatory dependency - where dulwich currently just depends on python
> > itself.
> How do you feel about including six in dulwich as dulwich.contrib.six? it will
> include 1 python file, and the LICENSE file
That would make the dependency issue go away, but still leaves us with a lot of
calls to six all over the place. The latter is what I am mainly concerned about.
> Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> > 2to3 doesn't sound too bad, though I don't have any experience with it
> > myself.
>
> I've look at it, and it does seem feasible. For this branch, I've got the test
> suite running:
>
> https://github.com/garyvdm/dulwich/tree/python3-2to3
>
> $ python3 setup.py test
> ....
> Ran 862 tests in 4.278s
>
> FAILED (failures=12, errors=553, skipped=16)
>
> This is not to far off my python-six branch
>
> $ python3 setup.py test
> ....
> Ran 862 tests in 2.048s
>
> FAILED (failures=10, errors=453, skipped=22)
Thanks.
I think I'd prefer that option, keeping the compatibility code out of
Dulwich itself as much as possible.
> > Gary van der Merwe wrote:
> >> I would like to get feedback on this approach I'm taking. In
> >> particular, I would like to get a review of the string changes that I
> >> have done so far [2]. Are we happy with removing the string formating
> >> for bytes appending, e.g. changing
> >> "%s %d\0" % (type_name, length)
> >> to
> >> type_name + b' ' + str(length).encode('ascii') + b'\0'
>
>
> Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> > That's not great, but as long as this is limited to a couple of places
> > it should be fine.
>
> Unfortunately this is in a LOT of places. In objects.py, I even ended up
> writing a small helper function for the serialize methods:
>
> https://github.com/garyvdm/dulwich/blob/python3/dulwich/objects.py#L175
>
> Usage examples:
>
> https://github.com/garyvdm/dulwich/blob/python3/dulwich/objects.py#L667
>
> The other options are:
>
> * Use string formatting, and encode (which also means we have to decode
> when ever we need to include bytes.)
That's definitely not an option, as it means performance impact.
> * Use PEP 461. This unfortunately means waiting for, and only supporting
> python 3.5. It seems like it is not possible to monkeypatch this in.
That would be an ideal solution. How far along is 3.5?
Cheers,
Jelmer
References