larry-discuss team mailing list archive
-
larry-discuss team
-
Mailing list archive
-
Message #00030
Re: [Blueprint simplify-unit-testing] Create a larry specific assert function to simplify unit testing
On Thu, Feb 4, 2010 at 1:59 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
> On Thu, Feb 4, 2010 at 10:54 AM, <josef.pktd@xxxxxxxxx> wrote:
>> On Thu, Feb 4, 2010 at 1:38 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>> On Thu, Feb 4, 2010 at 10:35 AM, <josef.pktd@xxxxxxxxx> wrote:
>>>> On Thu, Feb 4, 2010 at 1:25 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>> On Thu, Feb 4, 2010 at 10:08 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>> On Thu, Feb 4, 2010 at 9:11 AM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>>> On Thu, Feb 4, 2010 at 11:17 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>> On Thu, Feb 4, 2010 at 8:04 AM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>> On Thu, Feb 4, 2010 at 10:33 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>> On Wed, Feb 3, 2010 at 7:04 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>>>> On Wed, Feb 3, 2010 at 9:54 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>>>>> On Wed, Feb 3, 2010 at 9:16 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>>>>> On Wed, Feb 3, 2010 at 6:06 PM, joep <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>>>>>>> Blueprint changed by joep:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Whiteboard set to:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> assert_larry
>>>>>>>>>>>>>> http://bazaar.launchpad.net/~kwgoodman/larry/trunk/annotate/head%3A/la/tests/deflarry_nose_test.py#L49
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> and the method check_function in test class
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://bazaar.launchpad.net/~kwgoodman/larry/trunk/annotate/head%3A/la/tests/deflarry_nose_test.py#L128
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> already contain the extracted boiler plate assert function to compare a
>>>>>>>>>>>>>> larry with desired data matrix x and labels
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the later has a view keyword to choose whether to verify nocopy or
>>>>>>>>>>>>>> noreference
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> used in nosetests for larry methods
>>>>>>>>>>>>>
>>>>>>>>>>>>> I moved this to the larry-discuss list since it is hard to discuss it
>>>>>>>>>>>>> through the whiteboard on the blueprint.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, let's start with those functions and then prettify them.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What should the signature be?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The current signature is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> assert_larry(opname, la, t, lab, msgstr)
>>>>>>>>>>>>>
>>>>>>>>>>>>> How about changing that to the same signature as np.testing.assert_equal? So:
>>>>>>>>>>>>>
>>>>>>>>>>>>> assert_equal(actual, desired, err_msg='', verbose=True)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then we don't have to separate the data and the label. And instead of
>>>>>>>>>>>>> nancode we can use the numpy nan aware assert in numpy 1.4.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Oops...dinner time!
>>>>>>>>>>>>
>>>>>>>>>>>> I'm just browsing the code and adding some notes.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, matching the numpy signature for assert_equal and
>>>>>>>>>>>> assert_almost_equal is a good idea.
>>>>>>>>>>>> If you require numpy 1.4 for the test suite, then most of the boiler
>>>>>>>>>>>> plate is gone (nan handling)
>>>>>>>>>>>>
>>>>>>>>>>>> I think that assert_larry_equal is equivalent to
>>>>>>>>>>>> assert_equal(la1.x, la2.x)
>>>>>>>>>>>> assert_equal(la1.label, la2.label)
>>>>>>>>>>>>
>>>>>>>>>>>> slicing_test.py/test_slicing and test_morph use directly an assert_
>>>>>>>>>>>> which could be replaced by np.testing.assert_equal
>>>>>>>>>>>
>>>>>>>>>>> just another comment:
>>>>>>>>>>>
>>>>>>>>>>> there are 3 patterns in the test suite corresponding to the previous comments
>>>>>>>>>>>
>>>>>>>>>>> * python unittest with boilerplate
>>>>>>>>>>> * numpy 1.3 nosetests with explicit nan handling
>>>>>>>>>>> * numpy 1.4 nosetests where numpy.testing assert do the nan handling
>>>>>>>>>>>
>>>>>>>>>>> In the 4th option an explicit function assert_larry_xxx is not really
>>>>>>>>>>> necessary, and the test for x and labels and nocopy/noreference could
>>>>>>>>>>> also be "yielded" directly from the test function/method.
>>>>>>>>>>
>>>>>>>>>> Instead of making many unit tests out of one call to larry's
>>>>>>>>>> assert_equal, which would occur if we used yield, I think it is better
>>>>>>>>>> for the whole thing be one unit test. That would mean that we'd have
>>>>>>>>>> to wrap calls to np.testing.assert_equal in try...except blocks,
>>>>>>>>>> collect any error messages, and raise an AssertionError at the end of
>>>>>>>>>> the function if needed.
>>>>>>>>>
>>>>>>>>> several asserts don't have to be yielded, if you want them to be only
>>>>>>>>> one unittest that fails at the first assertion error, e.g.
>>>>>>>>> def test_movingsum32(self) in deflarry_nose_test.py
>>>>>>>>
>>>>>>>> I think it is better for debugging if all the info is printed out when
>>>>>>>> the test fails. For example, if a test fails on the label, Id like to
>>>>>>>> know if it passed on the array.
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> As for signature, how about
>>>>>>>>>>
>>>>>>>>>> assert_equal(actual, desired, msg='', dtype=True, noreference=True,
>>>>>>>>>> nocopy=False, verbose=True)
>>>>>>>>>
>>>>>>>>> this doesn't work, since noreference and nocopy also need the original larry,
>>>>>>>>> the signature of check function is
>>>>>>>>
>>>>>>>> Good point.
>>>>>>>>
>>>>>>>> An alternative to passing in the original and the actual is to pass in
>>>>>>>> the original and the function that modifies the original to produce
>>>>>>>> the actual. Then two larrys are always passed in. But that sounds
>>>>>>>> messy.
>>>>>>>>
>>>>>>>>> check_function(self, t, label, p, orig, view=False)
>>>>>>>>>
>>>>>>>>> the signature could be
>>>>>>>>> assert_larry_equal(actual, desired, msg='', dtype=True, original=None,
>>>>>>>>> noreference=True,
>>>>>>>>> nocopy=False, verbose=True)
>>>>>>>>
>>>>>>>> Should we go with the signature above?
>>>>>>>>
>>>>>>>>> but for noreference=True check an original has to be included
>>>>>>>>>
>>>>>>>>> with
>>>>>>>>> assert_larry_equal(actual, desired, original, msg='', dtype=True,
>>>>>>>>> noreference=True,
>>>>>>>>> nocopy=False, verbose=True)
>>>>>>>>>
>>>>>>>>> the original would have to be passed in even if both noreference and
>>>>>>>>> nocopy are False
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> So by default the dtype would be compared. Sometimes you expect the
>>>>>>>>>> dtype to change so maybe an option would be to pass in the dtype for
>>>>>>>>>> the "desired" larry.
>>>>>>>>>>
>>>>>>>>>> I think that would cover the most common use cases.
>>>>>>>>>
>>>>>>>>> Yes, I think so for the comparison of two larrys
>>>>>>>>>
>>>>>>>>> slicing tests e.g. test_slicing, would need a new function for
>>>>>>>>> noreference, nocopy that verifies e.g. that a slice is really a view.
>>>>>>>>> (I don't know what the numpy tests for view versus copy are for fancy
>>>>>>>>> slicing/indexing)
>>>>>>>>>
>>>>>>>>> Josef
>>>>>>>
>>>>>>> I attached a draft of the assert_larry_equal function. it imports some
>>>>>>> helper functions from test.py in the la/tests folder, which
>>>>>>> could/should also be rewritten into assert form.
>>>>>>> It's a draft, I haven't checked if it is working correctly yet for all cases.
>>>>>>>
>>>>>>> Also, I think it would be better to add the testing helper functions
>>>>>>> to la.utils so that they can be imported and don't need to have a copy
>>>>>>> in the test folder, as in the case of test.py.
>>>>>>
>>>>>> I made a few tweaks. To get all tests to run even if the first one
>>>>>> (check labels) fails, I wrapped the asserts in try...except (just the
>>>>>> first two for now).
>>>>>>
>>>>>> Also added a check that original is not None when doing a noreference
>>>>>> check ro nocopy check. And changed an assert to assert_equal.
>>>>>>
>>>>>> def assert_larry_equal(actual, desired, msg='', dtype=True, original=None,
>>>>>> noreference=True, nocopy=False, verbose=True):
>>>>>> #assert equality of attributes of two larries
>>>>>>
>>>>>> fail = []
>>>>>>
>>>>>> try:
>>>>>> assert_equal(actual.x, desired.x, 'x')
>>>>>> except AssertionError, err:
>>>>>> fail.append(str(err))
>>>>>> try:
>>>>>> assert_equal(actual.label, desired.label, 'label')
>>>>>> except AssertionError, errmsg:
>>>>>> fail.append(str(err))
>>>>>>
>>>>>> if dtype:
>>>>>> msg = printfail(actual.x.dtype, desired.x.dtype, 'x.dtype')
>>>>>> assert_equal(actual.x.dtype, desired.x.dtype, msg)
>>>>>>
>>>>>> if noreference:
>>>>>> if original is None:
>>>>>> raise ValueError, 'original must be a larry to run
>>>>>> noreference check.'
>>>>>> assert_(assert_noreference(actual, original), 'Reference found')
>>>>>> elif nocopy:
>>>>>> if original is None:
>>>>>> raise ValueError, 'original must be a larry to run nocopy
>>>>>> check.'
>>>>>> assert_(assert_nocopy(actual, original), 'copy instead of
>>>>>> reference found')
>>>>>> else: #FIXME check view for different dimensional larries
>>>>>> pass
>>>>>>
>>>>>> if len(fail) > 0:
>>>>>> msg = ''.join(fail)
>>>>>> raise AssertionError, msg
>>>>>
>>>>> I get this:
>>>>>
>>>>>>> x = larry([1,2,3])
>>>>>>> y = larry([2,2,3], [['a', 'b', 'c']])
>>>>>>>
>>>>>>> assert_larry_equal(x, y, 'cumsum test', noreference=False)
>>>>> ---------------------------------------------------------------------------
>>>>> AssertionError:
>>>>> Items are not equal:
>>>>> item=0
>>>>> item=0
>>>>> label
>>>>> ACTUAL: 0
>>>>> DESIRED: 'a'
>>>>> Arrays are not equal
>>>>> x
>>>>> (mismatch 33.3333333333%)
>>>>> x: array([1, 2, 3])
>>>>> y: array([2, 2, 3])
>>>>>
>>>>> with the code below. Hmm, the AssertionError message needs to be cleaned up.
>>>>>
>>>>> def assert_larry_equal(actual, desired, msg='', dtype=True, original=None,
>>>>> noreference=True, nocopy=False, verbose=True):
>>>>> #assert equality of attributes of two larries
>>>>>
>>>>> fail = []
>>>>>
>>>>> # label
>>>>> try:
>>>>> assert_equal(actual.label, desired.label, 'label')
>>>>> except AssertionError, err:
>>>>> fail.append(str(err))
>>>>>
>>>>> # Data array
>>>>> try:
>>>>> assert_equal(actual.x, desired.x, 'x')
>>>>> except AssertionError, err:
>>>>> fail.append(str(err))
>>>>>
>>>>> # dtype
>>>>> if dtype:
>>>>> try:
>>>>> assert_equal(actual.x.dtype, desired.x.dtype, 'dtype')
>>>>> except AssertionError, err:
>>>>> fail.append(str(err))
>>>>>
>>>>> # Check for references or copies
>>>>> if noreference:
>>>>> if original is None:
>>>>> raise ValueError, 'original must be a larry to run
>>>>> noreference check.'
>>>>> try:
>>>>> assert_(assert_noreference(actual, original), 'Reference found')
>>>>> except AssertionError, err:
>>>>> fail.append(str(err))
>>>>> elif nocopy:
>>>>> if original is None:
>>>>> raise ValueError, 'original must be a larry to run nocopy check.'
>>>>> try:
>>>>> assert_(assert_nocopy(actual, original), 'copy instead of
>>>>> reference found')
>>>>> except AssertionError, err:
>>>>> fail.append(str(err))
>>>>> else: #FIXME check view for different dimensional larries
>>>>> pass
>>>>>
>>>>> # Did the test pass?
>>>>> if len(fail) > 0:
>>>>> # No
>>>>> msg = ''.join(fail)
>>>>> raise AssertionError, msg
>>>>>
>>>>
>>>>
>>>> Another possibility to look at, might be to use a yield inside the
>>>> assert_larry instead of the try .. except, which might result in
>>>> pretty much the same outcome, except for raising the test count. But
>>>> if it's just in one function in doesn't increase boiler plate so it's
>>>> fine.
>>>>
>>>> mabye '\n'.join(fail) would make it easier to read.
>>>>
>>>> Can you send it as an attachment or add it to the blueprint? gmail
>>>> destroys the correct intend when I copy from the mail.
>>>
>>> https://blueprints.launchpad.net/larry/+spec/simplify-unit-testing
>>>
>>
>> looks good, we can make minor adjustments after we see actual test output
>> (One thought I tried is to remove the extra newline after
>> ------------------, but it doesn't make much difference.)
>>
>> the second function would be almost a copy of this with only one
>> assert_equal replace by assert_almost_equal for x
>>
>> def assert_larry_almost_equal(actual, desired, decimal=7, msg='',
>> dtype=True, original=None,
>> noreference=True, nocopy=False, verbose=True):
>
> Hmm, could we do almost equal whenever one of the dtypes is float;
> equal otherwise?
Yes,
In general I never use assert_equal with floats. In this case, I would
add the decimal keyword to assert_equal.
Also, to be less confusing, I would set decimal=14 (or similar) so the
function corresponds better to its name
Josef
Follow ups
References