larry-discuss team mailing list archive

Thread
Date

Re: Label indexing

To: Keith Goodman <kwgoodman@xxxxxxxxx>
From: josef.pktd@xxxxxxxxx
Date: Mon, 8 Feb 2010 14:08:24 -0500
Cc: larry-discuss@xxxxxxxxxxxxxxxxxxx
In-reply-to: <f4f93d421002081049m5b98c36by6e8dde5a89ac742c@mail.gmail.com>

On Mon, Feb 8, 2010 at 1:49 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
> On Mon, Feb 8, 2010 at 10:17 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>> On Mon, Feb 8, 2010 at 10:13 AM,  <josef.pktd@xxxxxxxxx> wrote:
>>> On Mon, Feb 8, 2010 at 12:39 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>> Is there a better design? Would it be better to always reorder on
>>>> binary operations even if the labels are aligned?
>>>
>>> I didn't have any good ideas when I briefly looked at the problem. I
>>> think my example was when I tried to do a diff() (when I didn't know
>>> much about larry yet), but I don't remember any details.
>>>
>>> For the time axis, ordering is important also for the moving functions
>>> and eg. fill_forward.
>>> For labels that are names, it would be nice to be able to work with
>>> arbitrary ordering.
>>>
>>> Does _align sort every axis, even if only one axis has disagreement ?
>>
>> No. Only the axes that are not aligned:
>>
>>>> y1 = la.larry([[1,2],[3,4]], [[1,0],[0,1]])
>>>> y2 = la.larry([[1,2],[3,4]], [[1,0],[1,0]])
>>>> y1 + y2
>>
>> label_0
>>    1
>>    0
>> label_1
>>    0
>>    1
>> x
>> array([[3, 3],
>>       [7, 7]])
>
> I already ran into a problem with this design:
>
>        lar.lix['a'] # row 'a'
>        lar.lix['a':] # row 'a' and everything to the right (slicing)
>        lar.lix[:, 'a'] # column 'a'
>        lar.lix['a', 'b', 'c'] # single element from 3d larry
>        lar.lix[['a', 'b', 'c']] # rows 'a', 'b', and 'c'
>        lar.lix['a':'b'] # slice
>        lar.lix['a':'b':2] # slice with step
>
> Consider the first example above but instead of 'a', let's make the
> label ('a', 2). Inside getitem I won't know what that means. Because:
>
>>> class Index(object):
>   ....:     def __init__(self):
>   ....:         pass
>   ....:     def __getitem__(self, index):
>   ....:         print index
>   ....:
>   ....:
>>> idx = Index()
>>> idx[('a', 1)]
> ('a', 1)
>>> idx['a', 1]
> ('a', 1)
>
> So to allow indexing by labels that are tuples, I'll need to change
> the design to:
>
>        lar.lix[['a']] # row 'a'
>        lar.lix[['a']:] # row 'a' and everything to the right (slicing)
>        lar.lix[:, ['a']] # column 'a'
>        lar.lix[['a'], ['b'], ['c']] # single element from 3d larry
>        lar.lix[['a', 'b', 'c']] # rows 'a', 'b', and 'c'
>        lar.lix[['a']:['b']] # slice
>        lar.lix[['a']:['b']:2] # slice with step
>
> Not as pretty. I'm starting to wonder if I should push this new
> feature to a later release.

I'm still thinking a lot in terms of other matrix languages, matlab, gauss

>>> idx[('a', 1),:,:]
(('a', 1), slice(None, None, None), slice(None, None, None))
>>> idx[('a', 1),...]
(('a', 1), Ellipsis)

it would be possible to disallow the numpy shortcut for dropping the
remaining slices.
Does using lists remove all ambiguities, between labels that are
lists, and a list of labels
>>> idx[['a', 1]]
['a', 1]
>>> idx[['a', 1],:]
(['a', 1], slice(None, None, None))
>>> idx[['a', 'b', 'c'],:]
(['a', 'b', 'c'], slice(None, None, None))

I would go for an unambigous, but restricted implementation for the
most common usecase, not try to match everything numpy can do.

Josef

Follow ups

Re: Label indexing
From: Keith Goodman, 2010-02-08

References

Label indexing
From: Keith Goodman, 2010-02-08
Re: Label indexing
From: josef . pktd, 2010-02-08
Re: Label indexing
From: Keith Goodman, 2010-02-08
Re: Label indexing
From: josef . pktd, 2010-02-08
Re: Label indexing
From: Keith Goodman, 2010-02-08
Re: Label indexing
From: josef . pktd, 2010-02-08
Re: Label indexing
From: Keith Goodman, 2010-02-08
Re: Label indexing
From: josef . pktd, 2010-02-08
Re: Label indexing
From: Keith Goodman, 2010-02-08
Re: Label indexing
From: Keith Goodman, 2010-02-08