testtools-dev team mailing list archive

Thread
Date
Re: [Merge] lp:~gz/testtools/unprintable-assertThat-80412 into lp:testtools

To: "Martin \[gz\]" <gzlist@xxxxxxxxxxxxxx>
From: "Martin \[gz\]" <gzlist@xxxxxxxxxxxxxx>
Date: Thu, 25 Aug 2011 23:34:22 -0000
In-reply-to: <20110824015724.19747.15764.launchpad@ackee.canonical.com>
Reply-to: mp+72641@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx
Thanks, this is is a useful review that covers all the stuff I needed to mention but didn't get in the proposal.

> To summarize, although the bug was originally filed against assertThat, it
> turns out that it's actually an issue in the matchers themselves as well as
> assertThat. Right?

Yes. The initial thoughts were about adding more escaping to assertThat, but by then most of the formatting has been done. By moving the onus onto Matchers to escape more carefully, it's possible to preserve more information.

> If so, we have to be thinking about:
>  * what are our failure modes with this code and existing broken matchers
>  * how to make it easy to define matchers and mismatches correctly

These are exactly the two primary concerns here. We don't want to punish test case authors with spurious errors if they specify an assert incorrectly, but give a clear enough failure for them to fix the problem. We also don't want learning about Python string quirks to be a prerequisite for writing a Matcher.

>  * Can you elaborate on your comment on istext()?

The result of the function is generally not enough information to actually use your string safely.

Two of the current callers in texttools are just trying to do special handling on a string, and otherwise assume the object is already a specific correct type. In both cases, the call is (harmlessly) subtly wrong, __import__ doesn't take non-ascii strings on Python 2, and Python 3 bytes can be used for regular expressions.

Code wanting to do actual string handling has it worse, as it will always risk getting a string that can't be safely interpolated, or written stream, or otherwise given as output.

In short, it's something of an attractive nuisance. In using it, I realised the text_repr interface wasn't very good.

>  * Is there a way we can make it so matchers never have to use _isbytes?
>    Perhaps by adding an "is_multiline" function to compat.

Something like this would be much better, possibly moving the multiline logic into the repr function (so True means triple quote if there's a newline, or by adding a third state). This was me lowering my standards and keeping coding, in order to get something that would work. :)

>  * text_repr is a bit hard to follow, maybe add some internal comments?

Yup, and some general cleanup would help, I left the first version that passed the tests alone so we could see if it actually worked as an interface before buffing it.

>  * Some more documentation on the intent & API of text_repr would help,
>    saying that it attempts to mimic repr() except returning unicode,
>    giving some examples, explaining why you might want to use it and also what
>    multiline does.

Will write this. As is probably clear from the tests, it aims to use the Python 3 logic for displaying literal unicode characters, and to maintain the `eval(repr(s)) == s` expectation.

>  * I don't get why multiline needs three different ways of escaping lines
>    whereas the non-multiline case only has two branches

So, there are a cube of relevant combinations:

               Multiline?
                 F   T
               +---+---+ Python 3
             / |   | R | bytes
           /   +---+---+
 Python 2 +---+---+| R | str
      str |   | S |+---+
          +---+---+   / 
  unicode | e | E | /   
          +---+---+

With non-multiline output Python 3 repr does what we want, as does str repr on Python 2. Everything else need some extra handling.

 [e] Uses _slow_escape on the whole string then quotes appropriately
 [R] Uses repr on each line (with small complications due to differing bytes and str interfaces)
 [S] Uses string-escape on each line
 [E] Uses _slow_escape on each line

Ideally the [R] and [S] cases would be shared as per non-multiline equivalents. However, there is no codec in Python 3 that has the repr handing we want, so we have to call the function then post-process the string. It would be possible to use repr here on Python 2 as well, but entails extra work over the string-escape codec and the complications with split("\n") aren't relevent.

Some rearrangements should make this less hairy.

>  * I think having a repr()-like function that takes any object and
>    special-cases text would be good

I'm thinking about renaming `text_repr` to `rich_repr` or similar, and adding code that calls repr on non-stringy objects, then escapes unprintable characters per the existing code but without the quoting. That would then be generally useful for avoiding control sequences leaking out.

>  * Just to be clear, we are expecting Mismatch.describe() to return unicode,
>    right?

On Python 2, ascii-only str is also currently acceptable. Being fussier about the type would make it harder to write code that would appear to work correctly until it was used on a system with different locale and other similar problems. Another option is guarding describe in MismatchError more carefully to check for non-ascii bytestrings, which has the downside of doing an extra decode()/encode() all the time.

This is where having a robust type system would make life easier. Being able to define a string with a limited range of valid bytes would save the need to redundant checking at every level of the interface.

>  * What are we expecting Matcher.__str__ to return? Text or bytes?

The native str type, but must be ascii-only on Python 2 so upcasting to unicode is safe. This isn't a tough requirement, if the direction of movement is towards fixing bug 686807, as __repr__ tends to be written in terms that satisfy this.

> On the original points:
> 
>  * We should add something to the docs explaining best practice for
>    Mismatch.describe() methods. i.e., do "foo %s" % (text_repr(x),)

I think "use repr, see our fancy repr alternative if dealing with large or non-ascii output" might be the right generally idea. Which makes me wonder, should pformat move into the fancy repr call, making _BinaryMismatch._format redundant?

>  * These docs need to bubble up, to AnnotatedMismatch, to assertThat etc.
>    e.g. The 'message' parameter of assertThat needs to have its type
>    documented.

Yes, and a few extra checks around things like message would help robustness.
-- 
https://code.launchpad.net/~gz/testtools/unprintable-assertThat-80412/+merge/72641
Your team testtools developers is subscribed to branch lp:testtools.
References

[Merge] lp:~gz/testtools/unprintable-assertThat-80412 into lp:testtools
From: Martin [gz], 2011-08-24