Further reading suggests:
globals.fsencoding = fse if fse not in ['ascii', None] else 'utf-8'
Gack! This is turning into a real fubar!
...Ken
On Sat, Nov 18, 2017 at 3:55 PM, Kenneth Loafman <kenneth@xxxxxxxxxxx
<mailto:kenneth@xxxxxxxxxxx>> wrote:
It seems we are fighting an old Python bug still extant in Python
3. Namely, sys.getfilesystemencoding() will return ascii if the
LC_* variables are not set (cron or other detached processes). At
one time in the early 3 series they defaulted to utf-8 if ascii
was returned. Then, as I understand it, the purists won and ascii
is returned now. So, I think that was a good enough idea, except
we should allow an override. I suggest we allow an option if the
FS is really something other than utf-8, but do something like
this in globals.py.
fse = sys.getfilesystemencoding()
globals.fsencoding = fse if fse != 'ascii' else 'utf-8'
Then allow it to be overridden in command line processing if
needed. Replace the two sys.getfilesystemencoding() with
globals.fsencoding and we should be 99% there.
...Ken
On Wed, Nov 15, 2017 at 9:40 AM, Kenneth Loafman
<kenneth@xxxxxxxxxxx <mailto:kenneth@xxxxxxxxxxx>> wrote:
Google for 'tox getfilesystemencoding' and 'setup.py test
getfilesystemencoding'. You'll see a bunch of discussion. It
may be that we need to move from 'setup.py test' to something
else.
...Ken
On Tue, Nov 14, 2017 at 3:40 PM, Aaron
<lists@xxxxxxxxxxxxxxxxxx <mailto:lists@xxxxxxxxxxxxxxxxxx>>
wrote:
Hello all,
I am hoping for some help to iron out a small testing bug
with:
https://code.launchpad.net/~aaron-whitehouse/duplicity/08-unicode
<https://code.launchpad.net/%7Eaaron-whitehouse/duplicity/08-unicode>so
that I can get the code committed. I believe that the code
is working correctly, but our test setup (tox, pexpect
etc) is creating an environment identified as ASCII rather
than UTF-8 and that makes the tests fail.
*The branch aims to ease Python 2/3 compatibility*
For context, this branch aims to ease the conversion of
duplicity to be Python 2/3 compatible. It looks to me as
though the key stumbling block in previous efforts has
been the string unicode/bytes distinction in Python 3. My
plan with this branch was therefore to take manageable
sections of duplicity and convert the strings to Python 2
unicode/bytes strings, making it much easier to then
convert that code to Python 3 in the future, but in a way
that can be committed straight away to the existing code base.
*Using sys.getfilesystemencoding() misdetects
'ascii' in tests*
As the branch currently stands, all tests pass. If,
however (on my UTF-8 system) you change (util.py, line 66):
return bytes_filename.decode("UTF-8", "ignore")
to:
return bytes_filename.decode(sys.getfilesystemencoding(), "ignore")
then tests (mainly in
testing.functional.test_selection.TestUnicode) fail.
Changing "ignore" in the above line to "strict" gives
errors that suggest an encoding error issue:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 25: ordinal not in range(128)
This is the case even though sys.getfilesystemencoding()
returns "UTF-8" on my setup. Putting a print statement
showing the result of sys.getfilesystemencoding() shows
this changing from "UTF-8" to "ANSI_X3.4-1968" once the
code is within the:
child = pexpect.spawn(b'/bin/sh', [b'-c', cmdline.encode(sys.getfilesystemencoding(),
'replace')],timeout=None)
*This looks to just be a problem with the test
suite*
The test suite prints a copy of the failing command, for
example (from
testing.functional.test_selection.TestUnicode.test_unicode_paths_non_globbing):
...command: "setsid" "-w" "duplicity" "full" "testfiles/select-unicode""file://testfiles/output" "--volsize" "1" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/उदाहरण.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/דוגמא.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/მაგალითი/" "--include" "testfiles/select-unicode/прыклад/пример/例/" "--exclude" "testfiles/select-unicode/прыклад/пример/" "--include" "testfiles/select-unicode/прыклад/" "--include" "testfiles/select-unicode/օրինակ.txt" "--exclude" "testfiles/select-unicode/**" "-v0" "--no-print-statistics" "--allow-source-mismatch" "--archive-dir=testfiles/cache" < /dev/null
If this (with PYTHONPATH added and the duplicity path
adjusted, executed from the "testing" folder with
"testfiles.tar.gz" extracted) is run directly in the
commandline:
$ PYTHONPATH=../ "../bin/duplicity" "full" "testfiles/select-unicode""file://testfiles/output" "--volsize" "1" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/उदाहरण.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/Παράδειγμα/דוגמא.txt" "--exclude" "testfiles/select-unicode/прыклад/пример/例/მაგალითი/" "--include" "testfiles/select-unicode/прыклад/пример/例/" "--exclude" "testfiles/select-unicode/прыклад/пример/" "--include" "testfiles/select-unicode/прыклад/" "--include" "testfiles/select-unicode/օրինակ.txt" "--exclude" "testfiles/select-unicode/**" "-v0" "--no-print-statistics" "--allow-source-mismatch" "--archive-dir=testfiles/cache"
Everything works correctly and restoring the files (using
this branch) and manually checking shows it worked
correctly (even with "strict"). The print statement also
shows that the system encoding is "UTF-8" throughout.
Help requested
Can anybody suggest what I can do to force the testing
environment to be UTF-8, or at least be detected as such
by sys.getfilesystemencoding? Alternatively, what is the
least awful way to make the tests work enough to get the
(apparently working) code committed?
Many thanks,
Aaron
_______________________________________________
Mailing list: https://launchpad.net/~duplicity-team
<https://launchpad.net/%7Eduplicity-team>
Post to : duplicity-team@xxxxxxxxxxxxxxxxxxx
<mailto:duplicity-team@xxxxxxxxxxxxxxxxxxx>
Unsubscribe : https://launchpad.net/~duplicity-team
<https://launchpad.net/%7Eduplicity-team>
More help : https://help.launchpad.net/ListHelp
<https://help.launchpad.net/ListHelp>