← Back to team overview

duplicity-team team mailing list archive

Re: Help with unicode branch (for Python3 support)

 

Further reading suggests:
globals.fsencoding = fse if fse not in ['ascii', None] else 'utf-8'

Gack!  This is turning into a real fubar!

...Ken


On Sat, Nov 18, 2017 at 3:55 PM, Kenneth Loafman <kenneth@xxxxxxxxxxx>
wrote:

> It seems we are fighting an old Python bug still extant in Python 3.
> Namely, sys.getfilesystemencoding() will return ascii if the LC_* variables
> are not set (cron or other detached processes).  At one time in the early 3
> series they defaulted to utf-8 if ascii was returned.  Then, as I
> understand it, the purists won and ascii is returned now.  So, I think that
> was a good enough idea, except we should allow an override.  I suggest we
> allow an option if the FS is really something other than utf-8, but do
> something like this in globals.py.
>
> fse = sys.getfilesystemencoding()
> globals.fsencoding = fse if fse != 'ascii' else 'utf-8'
>
> Then allow it to be overridden in command line processing if needed.
> Replace the two sys.getfilesystemencoding() with globals.fsencoding and we
> should be 99% there.
>
> ...Ken
>
>
>
>
>
> On Wed, Nov 15, 2017 at 9:40 AM, Kenneth Loafman <kenneth@xxxxxxxxxxx>
> wrote:
>
>> Google for 'tox getfilesystemencoding' and 'setup.py test
>> getfilesystemencoding'.  You'll see a bunch of discussion.  It may be that
>> we need to move from 'setup.py test' to something else.
>>
>> ...Ken
>>
>>
>> On Tue, Nov 14, 2017 at 3:40 PM, Aaron <lists@xxxxxxxxxxxxxxxxxx> wrote:
>>
>>> Hello all,
>>>
>>> I am hoping for some help to iron out a small testing bug with:
>>> https://code.launchpad.net/~aaron-whitehouse/duplicity/08-unicode
>>> so that I can get the code committed. I believe that the code is working
>>> correctly, but our test setup (tox, pexpect etc) is creating an environment
>>> identified as ASCII rather than UTF-8 and that makes the tests fail.
>>> *The branch aims to ease Python 2/3 compatibility*
>>>
>>> For context, this branch aims to ease the conversion of duplicity to be
>>> Python 2/3 compatible. It looks to me as though the key stumbling block in
>>> previous efforts has been the string unicode/bytes distinction in Python 3.
>>> My plan with this branch was therefore to take manageable sections of
>>> duplicity and convert the strings to Python 2 unicode/bytes strings, making
>>> it much easier to then convert that code to Python 3 in the future, but in
>>> a way that can be committed straight away to the existing code base.
>>> *Using sys.getfilesystemencoding() misdetects 'ascii' in tests*
>>>
>>> As the branch currently stands, all tests pass. If, however (on my UTF-8
>>> system) you change (util.py, line 66):
>>>
>>> return bytes_filename.decode("UTF-8", "ignore")
>>>
>>> to:
>>>
>>> return bytes_filename.decode(sys.getfilesystemencoding(), "ignore")
>>>
>>> then tests (mainly in testing.functional.test_selection.TestUnicode)
>>> fail. Changing "ignore" in the above line to "strict" gives errors that
>>> suggest an encoding error issue:
>>>
>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 25: ordinal not in range(128)
>>>
>>> This is the case even though sys.getfilesystemencoding() returns "UTF-8"
>>> on my setup. Putting a print statement showing the result of
>>> sys.getfilesystemencoding() shows this changing from "UTF-8" to
>>> "ANSI_X3.4-1968" once the code is within the:
>>>
>>> child = pexpect.spawn(b'/bin/sh', [b'-c', cmdline.encode(sys.getfilesystemencoding(),
>>>                                                          'replace')], timeout=None)
>>>
>>> *This looks to just be a problem with the test suite*
>>>
>>> The test suite prints a copy of the failing command, for example (from
>>> testing.functional.test_selection.TestUnicode.test_unicode_p
>>> aths_non_globbing):
>>>
>>> ...command: "setsid" "-w" "duplicity" "full" "testfiles/select-unicode" "file://testfiles/output" "--volsize" "1" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/उदाहरण.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/דוגמא.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/მაგალითი/" "--include" "testfiles/select-unicode/прыклад/пример/例/" "--exclude" "testfiles/select-unicode/прыклад/пример/" "--include" "testfiles/select-unicode/прыклад/" "--include" "testfiles/select-unicode/օրինակ.txt" "--exclude" "testfiles/select-unicode/**" "-v0" "--no-print-statistics" "--allow-source-mismatch" "--archive-dir=testfiles/cache" < /dev/null
>>>
>>> If this (with PYTHONPATH added and the duplicity path adjusted, executed
>>> from the "testing" folder with "testfiles.tar.gz" extracted) is run
>>> directly in the commandline:
>>>
>>> $ PYTHONPATH=../ "../bin/duplicity" "full" "testfiles/select-unicode" "file://testfiles/output" "--volsize" "1" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/उदाहरण.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/דוגמא.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/მაგალითი/" "--include" "testfiles/select-unicode/прыклад/пример/例/" "--exclude" "testfiles/select-unicode/прыклад/пример/" "--include" "testfiles/select-unicode/прыклад/" "--include" "testfiles/select-unicode/օրինակ.txt" "--exclude" "testfiles/select-unicode/**" "-v0" "--no-print-statistics" "--allow-source-mismatch" "--archive-dir=testfiles/cache"
>>>
>>> Everything works correctly and restoring the files (using this branch)
>>> and manually checking shows it worked correctly (even with "strict"). The
>>> print statement also shows that the system encoding is "UTF-8" throughout.
>>> Help requested
>>>
>>> Can anybody suggest what I can do to force the testing environment to be
>>> UTF-8, or at least be detected as such by sys.getfilesystemencoding?
>>> Alternatively, what is the least awful way to make the tests work enough to
>>> get the (apparently working) code committed?
>>> Many thanks,
>>>
>>> Aaron
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~duplicity-team
>>> Post to     : duplicity-team@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~duplicity-team
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>

Follow ups

References