duplicity-team team mailing list archive

Thread
Date

Help with unicode branch (for Python3 support)

To: "duplicity-team@xxxxxxxxxxxxxxxxxxx" <duplicity-team@xxxxxxxxxxxxxxxxxxx>
From: Aaron <lists@xxxxxxxxxxxxxxxxxx>
Date: Tue, 14 Nov 2017 21:40:19 +0000

Hello all,

I am hoping for some help to iron out a small testing bug with:
https://code.launchpad.net/~aaron-whitehouse/duplicity/08-unicode

so that I can get the code committed. I believe that the code is workingcorrectly, but our test setup (tox, pexpect etc) is creating anenvironment identified as ASCII rather than UTF-8 and that makes thetests fail.



         *The branch aims to ease Python 2/3 compatibility*

For context, this branch aims to ease the conversion of duplicity to bePython 2/3 compatible. It looks to me as though the key stumbling blockin previous efforts has been the string unicode/bytes distinction inPython 3. My plan with this branch was therefore to take manageablesections of duplicity and convert the strings to Python 2 unicode/bytesstrings, making it much easier to then convert that code to Python 3 inthe future, but in a way that can be committed straight away to theexisting code base.



         *Using sys.getfilesystemencoding() misdetects 'ascii' in tests*

As the branch currently stands, all tests pass. If, however (on my UTF-8system) you change (util.py, line 66):


return bytes_filename.decode("UTF-8", "ignore")

to:

return bytes_filename.decode(sys.getfilesystemencoding(), "ignore")

then tests (mainly in testing.functional.test_selection.TestUnicode)fail. Changing "ignore" in the above line to "strict" gives errors thatsuggest an encoding error issue:


UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 25: ordinal not in range(128)

This is the case even though sys.getfilesystemencoding() returns "UTF-8"on my setup. Putting a print statement showing the result ofsys.getfilesystemencoding() shows this changing from "UTF-8" to"ANSI_X3.4-1968" once the code is within the:


child = pexpect.spawn(b'/bin/sh', [b'-c', cmdline.encode(sys.getfilesystemencoding(),
                                                         'replace')],timeout=None)


         *This looks to just be a problem with the test suite*

The test suite prints a copy of the failing command, for example (fromtesting.functional.test_selection.TestUnicode.test_unicode_paths_non_globbing):


...command: "setsid" "-w" "duplicity" "full" "testfiles/select-unicode" "file://testfiles/output" "--volsize" "1" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/उदाहरण.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/דוגמא.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/მაგალითი/" "--include" "testfiles/select-unicode/прыклад/пример/例/" "--exclude" "testfiles/select-unicode/прыклад/пример/" "--include" "testfiles/select-unicode/прыклад/" "--include" "testfiles/select-unicode/օրինակ.txt" "--exclude" "testfiles/select-unicode/**" "-v0" "--no-print-statistics" "--allow-source-mismatch" "--archive-dir=testfiles/cache" < /dev/null

If this (with PYTHONPATH added and the duplicity path adjusted, executedfrom the "testing" folder with "testfiles.tar.gz" extracted) is rundirectly in the commandline:


$ PYTHONPATH=../ "../bin/duplicity" "full" "testfiles/select-unicode""file://testfiles/output"  "--volsize" "1" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/उदाहरण.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/דוגמא.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/მაგალითი/" "--include" "testfiles/select-unicode/прыклад/пример/例/" "--exclude" "testfiles/select-unicode/прыклад/пример/" "--include" "testfiles/select-unicode/прыклад/" "--include" "testfiles/select-unicode/օրինակ.txt" "--exclude" "testfiles/select-unicode/**" "-v0" "--no-print-statistics" "--allow-source-mismatch" "--archive-dir=testfiles/cache"

Everything works correctly and restoring the files (using this branch)and manually checking shows it worked correctly (even with "strict").The print statement also shows that the system encoding is "UTF-8"throughout.



         Help requested

Can anybody suggest what I can do to force the testing environment to beUTF-8, or at least be detected as such by sys.getfilesystemencoding?Alternatively, what is the least awful way to make the tests work enoughto get the (apparently working) code committed?


Many thanks,

Aaron

Follow ups

Re: Help with unicode branch (for Python3 support)
From: Kenneth Loafman, 2017-11-15