← Back to team overview

testtools-dev team mailing list archive

[Merge] lp:~gz/testtools/unicode_doctestmatches_764170 into lp:testtools

 

Martin [gz] has proposed merging lp:~gz/testtools/unicode_doctestmatches_764170 into lp:testtools with lp:~gz/testtools/wrap_utf8_terminals_804122 as a prerequisite.

Requested reviews:
  Jonathan Lange (jml)
Related bugs:
  Bug #764170 in testtools: "DocTestMatches error when 'actual' is unicode"
  https://bugs.launchpad.net/testtools/+bug/764170

For more details, see:
https://code.launchpad.net/~gz/testtools/unicode_doctestmatches_764170/+merge/66674

Work towards trying to make DocTestMatches correct with non-ascii text. So far, bug 764170 is not actually fixed, because depending on the Python version there's all kinds of fun and interesting fallout.


Python 2.4, 2.5, and some 2.6.x versions
========================================

TestDocTestMatchesInterfaceUnicode.test_describe_difference fails because the AssertionError raised can't be stringified, as bug 804127 describes. The options here are mangling down to ascii, or using a AssertionError subclass that explicitly supports unicode.


Python 2.6 and 2.7
==================

TestDocTestMatchesInterfaceUnicode.test_describe_difference fails because doctest.OutputChecker.output_difference mangles any unicode down to str. It also does it in the most unhelpful way possible, using sys.stdout.encoding and from a module-global method innocently named '_indent' so it's very hard to revert the change to get the behaviour we want back again.

Blame: <http://hg.python.org/cpython/rev/d0c2b9c2babb>

Short of monkeypatching, I don't have any good ideas here.


Python 3
========

TestDocTestMatchesInterfaceUnicode.test_matches_match fails (unhelpfully) because on the statement `DocTestMatches("\xa7").match("\\xa7")` returns None, in other words, an escaped string matches its unescaped form. This basically makes doctest matchers useless for checking non-ascii text as it won't assert that you've got the escaping correct.

Blame: <http://hg.python.org/cpython/diff/da47e7e135ae/Lib/doctest.py>

This is fixable by overriding that method with a noop.

TestDocTestMatchesInterfaceUnicode.test_describe_difference fails because of the flipside, repr returns non-ascii strings by default. Adding another str_is_unicode conditional will fix this, the behaviour is somewhat reasonable.


Summary, there's a lot of needless pain in trying to reuse doctest.
-- 
https://code.launchpad.net/~gz/testtools/unicode_doctestmatches_764170/+merge/66674
Your team testtools developers is subscribed to branch lp:testtools.
=== modified file 'testtools/matchers.py'
--- testtools/matchers.py	2011-07-29 16:16:43 +0000
+++ testtools/matchers.py	2011-08-05 15:35:03 +0000
@@ -154,6 +154,44 @@
         return self.original.get_details()
 
 
+class _NonManglingOutputChecker(doctest.OutputChecker):
+    """Doctest checker that works with unicode rather than mangling strings
+
+    This is needed because current Python versions have tried to fix string
+    encoding related problems, but regressed the default behaviour with unicode
+    inputs in the process.
+
+    In Python 2.6 and 2.7 `OutputChecker.output_difference` is was changed to
+    return a bytestring encoded as per `sys.stdout.encoding`, or utf-8 if that
+    can't be determined. Worse, that encoding process happens in the innocent
+    looking `_indent` global function. Because the `DocTestMismatch.describe`
+    result may well not be destined for printing to stdout, this is no good
+    for us. To get a unicode return as before, the method is monkey patched if
+    `doctest._encoding` exists.
+   
+    Python 3 has a different problem. For some reason both inputs are encoded
+    to ascii with 'backslashreplace', making an escaped string matches its
+    unescaped form. Overriding the offending `OutputChecker._toAscii` method
+    is sufficient to revert this.
+    """
+    
+    def _toAscii(self, s):
+        """Return `s` unchanged rather than mangling it to ascii"""
+        return s
+    
+    # Only do this overriding hackery if doctest has a broken _input function
+    if getattr(doctest, "_encoding", None) is not None:
+        from types import FunctionType as __F
+        __f = doctest.OutputChecker.output_difference.im_func
+        __g = dict(__f.func_globals)
+        def _indent(s, indent=4, _pattern=re.compile("^(?!$)", re.MULTILINE)):
+            """Prepend non-empty lines in `s` with `indent` number of spaces"""
+            return _pattern.sub(indent*" ", s)
+        __g["_indent"] = _indent
+        output_difference = __F(__f.func_code, __g, "output_difference")
+        del __F, __f, __g, _indent
+
+
 class DocTestMatches(object):
     """See if a string matches a doctest example."""
 
@@ -168,7 +206,7 @@
             example += '\n'
         self.want = example # required variable name by doctest.
         self.flags = flags
-        self._checker = doctest.OutputChecker()
+        self._checker = _NonManglingOutputChecker()
 
     def __str__(self):
         if self.flags:

=== modified file 'testtools/tests/test_matchers.py'
--- testtools/tests/test_matchers.py	2011-07-29 16:16:43 +0000
+++ testtools/tests/test_matchers.py	2011-08-05 15:35:03 +0000
@@ -113,6 +113,25 @@
         DocTestMatches("Ran 1 tests in ...s", doctest.ELLIPSIS))]
 
 
+<<<<<<< TREE
+=======
+class TestDocTestMatchesInterfaceUnicode(TestCase, TestMatchersInterface):
+
+    matches_matcher = DocTestMatches(_u("\xa7..."), doctest.ELLIPSIS)
+    matches_matches = [_u("\xa7"), _u("\xa7 more\n")]
+    matches_mismatches = ["\\xa7", _u("more \xa7"), _u("\n\xa7")]
+
+    str_examples = [("DocTestMatches(%r)" % (_u("\xa7\n"),),
+        DocTestMatches(_u("\xa7"))),
+        ]
+
+    describe_examples = [(
+        _u("Expected:\n    \xa7\nGot:\n    a\n"),
+        "a",
+        DocTestMatches(_u("\xa7"), doctest.ELLIPSIS))]
+
+
+>>>>>>> MERGE-SOURCE
 class TestDocTestMatchesSpecific(TestCase):
 
     def test___init__simple(self):


Follow ups