dulwich-users team mailing list archive

Thread
Date

Re: Python 3 porting

To: Gary van der Merwe <garyvdm@xxxxxxxxx>
From: Jelmer Vernooij <jelmer@xxxxxxxxx>
Date: Wed, 9 Apr 2014 15:17:49 +0200
Cc: dulwich-users@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CAJixRzoJQsf8sjcK9_GEArYT2fP6K6CzmYgai=WMR5Lzp3Vh7A@mail.gmail.com>
User-agent: Mutt/1.5.23 (2014-03-12)

On Wed, Apr 09, 2014 at 03:05:32PM +0200, Gary van der Merwe wrote:
> > Gary van der Merwe wrote:
> >> * where necessary encodes strings to bytes using the ascii codec, e.g.
> >> `str(hexsha).encode('ascii')`
> 
> Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> > I think we should try to have these as bytes in the first place where
> > possible, rather than converting them. Converting might still be a
> > good intermediate step, though.
> 
> Sorry, I did not explain this clearly.
> 
> In a few places, I'm encoding strings returned from functions. e.g.:
> 
>   # sha.hexdigest() returns a unicode string in python 3
>   self.sha().hexdigest().encode('ascii')
> 
>   str(length).encode('ascii')

Ah, I see. Perhaps we can keep hexdigest() as a unicode string and .digest() as
bytes? That is most consistent with the meaning of unicode and byte strings.

> Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> > I'm not a fan of six, and don't want to use it in the main dulwich
> > branch. The whole point of Python 3 is to make some
> > backwards-incompatible changes to clean up the language and code
> > written in it. Using six accomplishes the opposite of that, and adds
> > a mandatory dependency - where dulwich currently just depends on python
> > itself.
> How do you feel about including six in dulwich as dulwich.contrib.six? it will
> include 1 python file, and the LICENSE file

That would make the dependency issue go away, but still leaves us with a lot of
calls to six all over the place. The latter is what I am mainly concerned about.

> Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> > 2to3 doesn't sound too bad, though I don't have any experience with it
> > myself.
> 
> I've look at it, and it does seem feasible. For this branch, I've got the test
> suite running:
> 
> https://github.com/garyvdm/dulwich/tree/python3-2to3
> 
> $ python3 setup.py test
> ....
> Ran 862 tests in 4.278s
> 
> FAILED (failures=12, errors=553, skipped=16)
> 
> This is not to far off my python-six branch
> 
> $ python3 setup.py test
> ....
> Ran 862 tests in 2.048s
> 
> FAILED (failures=10, errors=453, skipped=22)
Thanks.

I think I'd prefer that option, keeping the compatibility code out of
Dulwich itself as much as possible.

> > Gary van der Merwe wrote:
> >> I would like to get feedback on this approach I'm taking. In
> >> particular, I would like to get a review of the string changes that I
> >> have done so far [2]. Are we happy with removing the string formating
> >> for bytes appending, e.g. changing
> >>     "%s %d\0" % (type_name, length)
> >> to
> >>     type_name + b' ' + str(length).encode('ascii') + b'\0'
> 
> 
> Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> > That's not great, but as long as this is limited to a couple of places
> > it should be fine.
> 
> Unfortunately this is in a LOT of places. In objects.py, I even ended up
> writing a small helper function for the serialize methods:
> 
> https://github.com/garyvdm/dulwich/blob/python3/dulwich/objects.py#L175
> 
> Usage examples:
> 
> https://github.com/garyvdm/dulwich/blob/python3/dulwich/objects.py#L667
> 
> The other options are:
> 
>  * Use string formatting, and encode (which also means we have to decode
>    when ever we need to include bytes.)
That's definitely not an option, as it means performance impact.

>  * Use PEP 461. This unfortunately means waiting for, and only supporting
>    python 3.5. It seems like it is not possible to monkeypatch this in.
That would be an ideal solution. How far along is 3.5?

Cheers,

Jelmer

References

Python 3 porting
From: Gary van der Merwe, 2014-04-03
Re: Python 3 porting
From: Jelmer Vernooij, 2014-04-06
Re: Python 3 porting
From: Gary van der Merwe, 2014-04-09