larry-discuss team mailing list archive

Thread
Date

Re: Label indexing

To: Keith Goodman <kwgoodman@xxxxxxxxx>
From: josef.pktd@xxxxxxxxx
Date: Mon, 8 Feb 2010 14:40:26 -0500
Cc: larry-discuss@xxxxxxxxxxxxxxxxxxx
In-reply-to: <f4f93d421002081124m66f64c71g8cbd9e83b0090d51@mail.gmail.com>

On Mon, Feb 8, 2010 at 2:24 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
> On Mon, Feb 8, 2010 at 11:08 AM,  <josef.pktd@xxxxxxxxx> wrote:
>> On Mon, Feb 8, 2010 at 1:49 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>> On Mon, Feb 8, 2010 at 10:17 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>> On Mon, Feb 8, 2010 at 10:13 AM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>> On Mon, Feb 8, 2010 at 12:39 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>> Is there a better design? Would it be better to always reorder on
>>>>>> binary operations even if the labels are aligned?
>>>>>
>>>>> I didn't have any good ideas when I briefly looked at the problem. I
>>>>> think my example was when I tried to do a diff() (when I didn't know
>>>>> much about larry yet), but I don't remember any details.
>>>>>
>>>>> For the time axis, ordering is important also for the moving functions
>>>>> and eg. fill_forward.
>>>>> For labels that are names, it would be nice to be able to work with
>>>>> arbitrary ordering.
>>>>>
>>>>> Does _align sort every axis, even if only one axis has disagreement ?
>>>>
>>>> No. Only the axes that are not aligned:
>>>>
>>>>>> y1 = la.larry([[1,2],[3,4]], [[1,0],[0,1]])
>>>>>> y2 = la.larry([[1,2],[3,4]], [[1,0],[1,0]])
>>>>>> y1 + y2
>>>>
>>>> label_0
>>>>    1
>>>>    0
>>>> label_1
>>>>    0
>>>>    1
>>>> x
>>>> array([[3, 3],
>>>>       [7, 7]])
>>>
>>> I already ran into a problem with this design:
>>>
>>>        lar.lix['a'] # row 'a'
>>>        lar.lix['a':] # row 'a' and everything to the right (slicing)
>>>        lar.lix[:, 'a'] # column 'a'
>>>        lar.lix['a', 'b', 'c'] # single element from 3d larry
>>>        lar.lix[['a', 'b', 'c']] # rows 'a', 'b', and 'c'
>>>        lar.lix['a':'b'] # slice
>>>        lar.lix['a':'b':2] # slice with step
>>>
>>> Consider the first example above but instead of 'a', let's make the
>>> label ('a', 2). Inside getitem I won't know what that means. Because:
>>>
>>>>> class Index(object):
>>>   ....:     def __init__(self):
>>>   ....:         pass
>>>   ....:     def __getitem__(self, index):
>>>   ....:         print index
>>>   ....:
>>>   ....:
>>>>> idx = Index()
>>>>> idx[('a', 1)]
>>> ('a', 1)
>>>>> idx['a', 1]
>>> ('a', 1)
>>>
>>> So to allow indexing by labels that are tuples, I'll need to change
>>> the design to:
>>>
>>>        lar.lix[['a']] # row 'a'
>>>        lar.lix[['a']:] # row 'a' and everything to the right (slicing)
>>>        lar.lix[:, ['a']] # column 'a'
>>>        lar.lix[['a'], ['b'], ['c']] # single element from 3d larry
>>>        lar.lix[['a', 'b', 'c']] # rows 'a', 'b', and 'c'
>>>        lar.lix[['a']:['b']] # slice
>>>        lar.lix[['a']:['b']:2] # slice with step
>>>
>>> Not as pretty. I'm starting to wonder if I should push this new
>>> feature to a later release.
>>
>> I'm still thinking a lot in terms of other matrix languages, matlab, gauss
>>
>>>>> idx[('a', 1),:,:]
>> (('a', 1), slice(None, None, None), slice(None, None, None))
>>>>> idx[('a', 1),...]
>> (('a', 1), Ellipsis)
>>
>> it would be possible to disallow the numpy shortcut for dropping the
>> remaining slices.
>
> Interesting idea.
>
>> Does using lists remove all ambiguities, between labels that are
>> lists, and a list of labels
>
> labels cannot be lists since lists are used to mark which axis each
> label belongs to. If we want lists to be labels then init would have
> to become

good that keeps the meaning of lists unambiguous. I still don't know
the restrictions or non-restrictions for some of the details in larry.
pandas uses labels as dictionary key, so this should also rule out
lists since they are not hashable.

Josef


>
> larry(np.ones((2,3)), label1, label2)
>
> instead of the current
>
> larry(np.ones((2,3)), [label1, label2])
>
> where label1 and label2 are lists.
>
> For example, these
>
> y = larry([1,2,3], [['a', 'b', 'c']])
> y = larry([[1,2],[3,4]], [['a', 'b'], ['c', 'd']])
>
> would become
>
> y = larry([1,2,3], ['a', 'b', 'c'])
> y = larry([[1,2],[3,4]], ['a', 'b'], ['c', 'd'])
>
> And
>
> y = larry(x, label)
>
> would become
>
> y = larry(x, *label)
>
> if ndim > 1.
>
>>>>> idx[['a', 1]]
>> ['a', 1]
>>>>> idx[['a', 1],:]
>> (['a', 1], slice(None, None, None))
>>>>> idx[['a', 'b', 'c'],:]
>> (['a', 'b', 'c'], slice(None, None, None))
>>
>> I would go for an unambigous, but restricted implementation for the
>> most common usecase, not try to match everything numpy can do.
>>
>> Josef
>>
>

References

Label indexing
From: Keith Goodman, 2010-02-08
Re: Label indexing
From: josef . pktd, 2010-02-08
Re: Label indexing
From: Keith Goodman, 2010-02-08
Re: Label indexing
From: josef . pktd, 2010-02-08
Re: Label indexing
From: Keith Goodman, 2010-02-08
Re: Label indexing
From: josef . pktd, 2010-02-08
Re: Label indexing
From: Keith Goodman, 2010-02-08
Re: Label indexing
From: Keith Goodman, 2010-02-08
Re: Label indexing
From: josef . pktd, 2010-02-08
Re: Label indexing
From: Keith Goodman, 2010-02-08