duplicity-team team mailing list archive
-
duplicity-team team
-
Mailing list archive
-
Message #04616
Help with unicode branch (for Python3 support)
Hello all,
I am hoping for some help to iron out a small testing bug with:
https://code.launchpad.net/~aaron-whitehouse/duplicity/08-unicode
so that I can get the code committed. I believe that the code is working
correctly, but our test setup (tox, pexpect etc) is creating an
environment identified as ASCII rather than UTF-8 and that makes the
tests fail.
*The branch aims to ease Python 2/3 compatibility*
For context, this branch aims to ease the conversion of duplicity to be
Python 2/3 compatible. It looks to me as though the key stumbling block
in previous efforts has been the string unicode/bytes distinction in
Python 3. My plan with this branch was therefore to take manageable
sections of duplicity and convert the strings to Python 2 unicode/bytes
strings, making it much easier to then convert that code to Python 3 in
the future, but in a way that can be committed straight away to the
existing code base.
*Using sys.getfilesystemencoding() misdetects 'ascii' in tests*
As the branch currently stands, all tests pass. If, however (on my UTF-8
system) you change (util.py, line 66):
return bytes_filename.decode("UTF-8", "ignore")
to:
return bytes_filename.decode(sys.getfilesystemencoding(), "ignore")
then tests (mainly in testing.functional.test_selection.TestUnicode)
fail. Changing "ignore" in the above line to "strict" gives errors that
suggest an encoding error issue:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 25: ordinal not in range(128)
This is the case even though sys.getfilesystemencoding() returns "UTF-8"
on my setup. Putting a print statement showing the result of
sys.getfilesystemencoding() shows this changing from "UTF-8" to
"ANSI_X3.4-1968" once the code is within the:
child = pexpect.spawn(b'/bin/sh', [b'-c', cmdline.encode(sys.getfilesystemencoding(),
'replace')],timeout=None)
*This looks to just be a problem with the test suite*
The test suite prints a copy of the failing command, for example (from
testing.functional.test_selection.TestUnicode.test_unicode_paths_non_globbing):
...command: "setsid" "-w" "duplicity" "full" "testfiles/select-unicode" "file://testfiles/output" "--volsize" "1" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/उदाहरण.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/דוגמא.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/მაგალითი/" "--include" "testfiles/select-unicode/прыклад/пример/例/" "--exclude" "testfiles/select-unicode/прыклад/пример/" "--include" "testfiles/select-unicode/прыклад/" "--include" "testfiles/select-unicode/օրինակ.txt" "--exclude" "testfiles/select-unicode/**" "-v0" "--no-print-statistics" "--allow-source-mismatch" "--archive-dir=testfiles/cache" < /dev/null
If this (with PYTHONPATH added and the duplicity path adjusted, executed
from the "testing" folder with "testfiles.tar.gz" extracted) is run
directly in the commandline:
$ PYTHONPATH=../ "../bin/duplicity" "full" "testfiles/select-unicode""file://testfiles/output" "--volsize" "1" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/उदाहरण.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/דוגמא.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/მაგალითი/" "--include" "testfiles/select-unicode/прыклад/пример/例/" "--exclude" "testfiles/select-unicode/прыклад/пример/" "--include" "testfiles/select-unicode/прыклад/" "--include" "testfiles/select-unicode/օրինակ.txt" "--exclude" "testfiles/select-unicode/**" "-v0" "--no-print-statistics" "--allow-source-mismatch" "--archive-dir=testfiles/cache"
Everything works correctly and restoring the files (using this branch)
and manually checking shows it worked correctly (even with "strict").
The print statement also shows that the system encoding is "UTF-8"
throughout.
Help requested
Can anybody suggest what I can do to force the testing environment to be
UTF-8, or at least be detected as such by sys.getfilesystemencoding?
Alternatively, what is the least awful way to make the tests work enough
to get the (apparently working) code committed?
Many thanks,
Aaron
Follow ups