← Back to team overview

testtools-dev team mailing list archive

Re: Producing Diffs in Assertion Output

 

On Mon, Dec 2, 2013 at 12:36 PM, Daniel Watkins <daniel@xxxxxxxxxxxxxxxxxxxx
> wrote:

> Hello all,
>
> As you are probably aware, the assertions in unittest.TestCase will
> output a diff for certain types of comparison.  As I understand it,
> testtools currently doesn't do this under any circumstances.
>
> This is a Bad Thing (TM) for two reasons: firstly, it makes it harder to
> identify minor differences between expected and actual output; this is a
> usability issue.  Secondly, as the Python stdlib already supports it, it
> is probably something that testtools will need to support to be a
> serious candidate for inclusion in the stdlib.
>
> The actual output of the diff is, I think, relatively straightforward;
> the Mismatches concerned would need some changes in their describe
> method (and some new, more specific Mismatches might be needed).  I
> assume that pull requests to implement this would be welcome (please
> correct me if not!).
>
> The difficult part would (I think) come when trying to support the
> existing maxDiff property that unittest.TestCase supports[0].  This
> defines the largest diff that should be output (with a "this diff is too
> long" message being output if it is exceeded), and part of the public
> interface of the TestCase class.  Personally, I don't think that the
> TestCase should be responsible for the presentation of assertion output
> (and therefore I dislike maxDiff being an instance attribute).  It seems
> that there's a level of agreement from the testtools developers, as
> assertThat takes a verbose argument (rather than relying on something on
> the TestCase instance)[1].
>
>
To summarize to this point:
 1. let's add diff support
 2. Mismatch.describe is the place to do it
 3. Supporting maxDiff is hard
 4. TestCase should not be responsible for presentation

Is that right?

If so, I'm cool with 1, meh about 2, unopinionated about 3, and strongly
agreeing with 4.



> I would be loath, however, to just add a max diff keyword argument to
> assertThat; that way madness lies[2].  I therefore propose changing the
> public API of assertThat to take a format_options argument which would
> describe all of the formatting details that the user desires.  I would
> suggest that this be a new class (called, perhaps, AssertionFormat)
> which can handle defaults sensibly (making changes/additions to the
> formatting interface less painful).
>
>
So this would be something controlled at the point of assertion. I guess
that's the point where there's the most knowledge about the assertion, but
it's not great for wanting a universal formatting behaviour. (Although you
address that below, suggesting a universal base class for such cases).

I agree that a more general approach to formatting is preferred, rather
than specific APIs for diff, and that if we had such an API we would do
well to use it for verbosity.


> Obviously there are some rough edges to work out here; this is just to
> give you an idea of the rough drift of my proposal.
>
>
Thanks for taking the time to do so.  Here are some random thoughts:

   - One way of framing this problem is, how to render a mismatch?
   - Although I doubt it's ever going to happen, I carry a torch in my
   heart for a decent GUI runner (I guess a web runner is much the same)
   - We probably can't just add all possible formatting options as details
   on the mismatch, because some of them (esp. diff) can be quite expensive to
   compute, and won't always be needed
   - Controlling the output of tests is something that you probably want to
   be able to do from the runner (e.g. with command line options)
   - Different mismatches have different relevant diffing algorithms
   (although the default unified diff on a pprint is a pretty good start)
   - Your proposal doesn't suggest an interface for AssertionFormat. It
   probably should.
   - Your proposal doesn't discuss recursive matchers. I don't know if this
   is an actual problem.
   - Pretty printing is actually a surprisingly deep but fortunately well
   understood problem, maybe that's what we actually want here

This also prompts me to deeper, half-formed thoughts about matchers, which
I'll dump here:

   - currently, they are just a unary predicate and a way of describing
   what "false" means
   - alternatively expressed: type Matcher a = a -> Maybe String
   - any computable unary predicate defines a set
   - almost all of the time, we actually want a binary predicate. The word
   'match' implies this, one matches a thing with something else
   - i.e. let ~ be a relation, then we want to know if a ~ b. (e.g. replace
   ~ with ==, is, subset). if not, we want something describing the
   differences in terms of ~.
      - normally a & b are both members of S, and the relation ~ is defined
      as a subset of S x S.
   - two very common variants of this:
      - f(a) ~ g(b)
         - whether either f or g is the identity function
         - or f == g
         - when viewing mismatches, we mostly care about the unmapped
         versions (i.e. a & b) rather than the mapped (i.e. f(a), g(b)), which
         explains some of the profusion of matchers
      - ~ is defined in terms of other relations which are applied to
      substructures within a & b.
         - I don't have mathematical tools to hand to talk about this
         clearly
      - interesting groups of relations
      - equivalence relations: ==, is
      - orderings: <, >, subset (only a relation if there is some
      well-defined universal set), etc.
      - membership: regex, in
         - this isn't at all a relation mathematically speaking, although I
         guess technically you can wrap the LHS in a set and turn it
into a subset
         operation
      - anything else?
   -  "subrelations" look most like membership, since matchers are just
   unary predicates, and any computable unary predicate defines a set
   - prompted by reflections that
      - matcher syntax isn't great in Python
      - it's mostly partial application
      - composing matchers isn't easy
      - we have a proliferation of matchers, perhaps due to Python's rich
      set of data structures, perhaps because we haven't found/applied
a coherent
      model


That was way more time than I was planning to spend.

jml

Follow ups

References