← Back to team overview

larry-discuss team mailing list archive

Re: stream lining signature of assert_larry_equal

 

On Fri, Feb 5, 2010 at 12:56 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
> On Fri, Feb 5, 2010 at 9:48 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>> On Fri, Feb 5, 2010 at 9:37 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>> On Fri, Feb 5, 2010 at 9:19 AM,  <josef.pktd@xxxxxxxxx> wrote:
>>>> On Fri, Feb 5, 2010 at 12:01 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>> On Fri, Feb 5, 2010 at 8:52 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>> On Fri, Feb 5, 2010 at 7:28 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>> On Fri, Feb 5, 2010 at 7:24 AM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>> The current signature
>>>>>>>>
>>>>>>>> assert_larry_equal(actual, desired, msg='', dtype=True, original=None,
>>>>>>>>                       noreference=True, nocopy=False)
>>>>>>>>
>>>>>>>> has 3 keywords, original, noreference, nocopy for the option to check
>>>>>>>> whether a larry is a view or a copy.
>>>>>>>>
>>>>>>>> This makes 8 binary options, only 3 are relevant cases, the other 5
>>>>>>>> combinations raise errors or are redundant. This was quite confusing
>>>>>>>> when I used the tests, since I often forgot to use, reset a keyword,
>>>>>>>> e.g. nocopy=True also requires setting noreference = False
>>>>>>>>
>>>>>>>> I think a more compact signature would be
>>>>>>>> assert_larry_equal(actual, desired, msg='', dtype=True, original=None,
>>>>>>>> noref=True)
>>>>>>>>
>>>>>>>> if not original is None:
>>>>>>>>    if noref:
>>>>>>>>       check noreference
>>>>>>>>    else:
>>>>>>>>       check nocopy
>>>>>>>>
>>>>>>>> otherwise don't do any reference/copy check
>>>>>>>>
>>>>>>>> the presence of original would indicate that we want the view check,
>>>>>>>> `noref` would tell which one.
>>>>>>>
>>>>>>> This is how larry gets improved---with cleaning like that.
>>>>>>>
>>>>>>> I'll make the change in the trunk.
>>>>>>
>>>>>> I changed the function signature as you suggested. I also changed
>>>>>> noref to iscopy. The change in the kw name will mess up your unit
>>>>>> tests :(
>>>>
>>>> I will change them
>>>>
>>>>>>
>>>>>> And I change the name of assert_noreference to assert_iscopy and
>>>>>> assert_nocopy to assert_isview.
>>>>>
>>>>> Here's a potential problem if the label contains objects:
>>>>>
>>>>>>> import datetime
>>>>>>> x = larry([1, 2] , [[9, datetime.date(2009,1,1)]])
>>>>>>> y = larry([1, 2] , [[7, x.label[0][1]]])
>>>>>>> x.label[0][1] is y.label[0][1]
>>>>>   True
>>>>>>> la.util.testing.assert_iscopy(x, y)
>>>>>>>
>>>>>
>>>>> assert_iscopy and assert_isview only compare the entire axis at once:
>>>>>
>>>>>    for i in xrange(larry1.ndim):
>>>>>        if larry1.label[i] is larry2.label[i]:
>>>>>            msg.append('The labels along axis %d share a reference.' % i)
>>>>>
>>>>> So I should change that to loop over each label element, right?
>>>>
>>>> Yes. Because this is for the tests, we want to catch all cases, even
>>>> if they don't show up in common usage.
>>>>
>>>> Do we have the same problem with nested labels, e.g. tuples after
>>>> flatten?  Maybe then a recursive helper function is useful.
>>>
>>> Now I know why I didn't check each label element:
>>>
>>>>>> 9 is 9
>>> True
>>>
>>> So I need to check if the object is immutable. Do you know how to do
>>> that? A tuple is immutable, but not the elements inside. Seems
>>> complicated.
>>
>> I'm only comparing, or trying to compare, label elements that are in
>> the same position in both larrys. I guess label element i along axis j
>> in larry1 should instead be compared to all label elements in larrys2
>> to search for references. That might actually be easier to code. We'd
>> need a function that collapses tuples and lists into a single list and
>> then do all cross comparisons. The unit test larrys are small so it
>> should not take long. And after comparing i with j we wouldn't need to
>> compare j to i.
>
> Ack! This is too complicated. And I don't see much gain from it. Plus
> the copylabel method in larry looks pretty solid:
>
>    def copylabel(self):
>        """
>        Return a copy of a larry's label.
>
>        Examples
>        --------
>        >>> y = larry([1, 2], [['a', 'b']])
>        >>> label = y.copylabel()
>        >>> label
>        [['a', 'b']]
>
>        """
>        return deepcopy(self.label)
>
> So I'm thinking of leaving assert_iscopy and assert_isview as is. So
> only check if an entire axis is a label copy.
>
> If we were to keep going with this we'd realize that the exact same
> issue applies to the data array in larry:
>
>>> x = np.array([1, datetime.date(2009,1,1)])
>>> y = np.array([2, x[1]])
>>> x[1] is y[1]
>   True
>>> x is y
>   False
>
> So a full comparison is needed there too.

No, for numpy arrays we have the array methods, and there are stronger
memory restrictions. When we compare larrys that are produced by
slicing then we would need to check the base attribute of the array,
we wouldn't check individual array elements.  Although, I'm not
completely sure this holds for object arrays.

Josef



Follow ups

References