larry-discuss team mailing list archive

Thread
Date

Re: A new proposal for indexing with labels

To: Keith Goodman <kwgoodman@xxxxxxxxx>
From: josef.pktd@xxxxxxxxx
Date: Mon, 8 Feb 2010 09:39:14 -0500
Cc: larry-discuss@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1cd32cbb1002072215u40a9950by120d44748e136dd@mail.gmail.com>

On Mon, Feb 8, 2010 at 1:15 AM,  <josef.pktd@xxxxxxxxx> wrote:
> On Mon, Feb 8, 2010 at 12:51 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>> On Sun, Feb 7, 2010 at 9:03 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>> On Sun, Feb 7, 2010 at 11:32 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>> On Sun, Feb 7, 2010 at 11:17 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>> On Sun, Feb 7, 2010 at 11:15 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>> On Sun, Feb 7, 2010 at 11:08 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>> On Sun, Feb 7, 2010 at 8:06 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>> On Sun, Feb 7, 2010 at 7:53 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>> On Sun, Feb 7, 2010 at 10:43 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>> On Sun, Feb 7, 2010 at 7:40 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>>>> On Sun, Feb 7, 2010 at 10:23 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>>>> On Sun, Feb 7, 2010 at 7:11 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> If you want to go this way, you could have lixs :  lix with strings,
>>>>>>>>>>>>> and lixr : lix with repr , ... ?
>>>>>>>>>>>>> lix would be safe  because it uses the labels directly, the other ones
>>>>>>>>>>>>> are only recommended in restricted cases where the string (or other)
>>>>>>>>>>>>> representation makes sense.
>>>>>>>>>>>>
>>>>>>>>>>>> Too confusing with two I think.
>>>>>>>>>>>>
>>>>>>>>>>>>> if you want to allow slicing i.e. `:`  then I think lix would need to
>>>>>>>>>>>>> use its __getitem__ instead of __call__ or __init__
>>>>>>>>>>>>> lar1[1:4, lix['msft', google], lixs['2010-02-01':]   or lar1[1:4,
>>>>>>>>>>>>> lix[['msft', google]], lixs['2010-02-01':] ?
>>>>>>>>>>>>
>>>>>>>>>>>> lar1[lix(date)]: will work. So will lar[lix(date1):lix(date2)]
>>>>>>>>>>>>
>>>>>>>>>>>> Allowing slices slows things down because now we have to look instead
>>>>>>>>>>>> each slice object (slice.start, slice.stop) for lix object and if
>>>>>>>>>>>> found convert them.
>>>>>>>>>>>
>>>>>>>>>>> I wouldn't mix lix and regular array slices or indices for the same
>>>>>>>>>>> axis. I would restrict it to if either one of (slice.start,
>>>>>>>>>>> slice.stop) are lix then both are interpreted as labels.
>>>>>>>>>>>
>>>>>>>>>>> either lar[lix[date1:date2],:]   or   lar[lix(date1):lix(date2), :]
>>>>>>>>>>> but not lar[lix(date1):-3]
>>>>>>>>>>
>>>>>>>>>> How come? Seems handy to me.
>>>>>>>>>
>>>>>>>>> Just a feeling, mixing oranges and apples, (If I have one label, I
>>>>>>>>> expect also to have the other, or I have neither in the other case)
>>>>>>>>>
>>>>>>>>> It would be useful to write a babyclass with the different versions of
>>>>>>>>> getitem. It's easier to see what's going on and to experiment than
>>>>>>>>> using the already more complicated getitem of larry.
>>>>>>>>>
>>>>>>>>> Josef
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> I don't know if lix should hold some code for conversion from label
>>>>>>>>>>>>> representation to indices, or be just an identifier for use in
>>>>>>>>>>>>> larry.__getitem__
>>>>>>>>>>>>
>>>>>>>>>>>> Yeah, a method getindex that takes label as input, perhaps. larry's
>>>>>>>>>>>> getitem is already getting harry, so keeping the code in lix instead
>>>>>>>>>>>> of adding yet more code to getitem might be a good idea. Or make a
>>>>>>>>>>>> function in util.misc like a did for string indexing.
>>>>>>>>
>>>>>>>> I'm thinking of allowing only one label element in lix. But allow
>>>>>>>> mixing label and integers for slicing.
>>>>>>>>
>>>>>>>> I tried allowing multiple labels per axis in the first indexing by
>>>>>>>> labels blueprint. It's hard. So as a first step, only allow one. For
>>>>>>>> example:
>>>>>>>>
>>>>>>>> lar[lix(date1):lix(date2)]
>>>>>>>> lar[:lix(date2)]
>>>>>>>> lar[lix(date1):-1]
>>>>>>>
>>>>>>> BTW, all this can already be done of course:
>>>>>>>
>>>>>>> lar[lar.labelindex(date1,0):lar.labelindex(date2,0)]
>>>>>>> lar[:lar.labelindex(date2,0)]
>>>>>>> lar[lar.labelindex(date1,0):-1]
>>>>>>
>>>>>> the usecase I had initially in mind was a list of labels
>>>>>> lar1[:, ['msft', 'google', 'f2', 'f3'], ['open', 'close']].diff(3).log().diff(0)
>>>>>
>>>>> although it doesn't make economic sense
>>>>>
>>>>>>
>>>>>> labelindex allows only single label
>>>>>>
>>>>>> Josef
>>>>>>
>>>>
>>>> something like this would do what I had in mind:
>>>>
>>>> class lix2(object):
>>>>    def __init__(self, label):
>>>>        if type(label) != list:
>>>>            raise TypeError, 'label must be a list'
>>>>        self.label = label
>>>>
>>>> class A(object):
>>>>
>>>>    def __init__(self, data, label):
>>>>        if type(label) != list:
>>>>            raise TypeError, 'label must be a list'
>>>>        self.label = label
>>>>        self.data = data
>>>>
>>>>    def __getitem__(self, ind):
>>>>        if isinstance(ind, lix2):
>>>>            idx = map(self.label.index, ind.label)
>>>>            return self.data[idx]
>>>>
>>>> aa = A(np.arange(10), 'a b c d e f g h i j'.split())
>>>> print aa[lix2(['a', 'b', 'i'])]
>>>>
>>>>>>> aa.label
>>>> ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
>>>>>>> aa.data
>>>> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>>>> aa[lix2(['a', 'b', 'i'])]
>>>> array([0, 1, 8])
>>>>
>>>
>>>
>>> some examples with my latest version with lix3, see attachment
>>>
>>>>>> aa = B(np.arange(10), 'a b c d e f g h i j'.split())
>>>>>> print aa[lix2('a', 'b', 'i')]
>>> [0 1 8]
>>>>>> print aa[lix3('a', 'b', 'i')]
>>> [0 1 8]
>>>>>> print aa[lix3['a':'d']]
>>> [0 1 2]
>>>>>> print aa[lix3['a':'h']]
>>> [0 1 2 3 4 5 6]
>>>>>> print aa[lix3['c':'h']]
>>> [2 3 4 5 6]
>>>>>> print aa[lix3['c':-2]]
>>> [2 3 4 5 6 7]
>>>>>> print aa[lix3[2:'e']]
>>> [2 3]
>>>>>> print aa[lix3['c':'h':2]]
>>> [2 4 6]
>>>>>> print aa[lix3['c':]]
>>> [2 3 4 5 6 7 8 9]
>>>>>> print aa[lix3[:'c']]
>>> [0 1]
>>>>>>
>>
>> The problem is that these lists, when used to index into lar.x, will
>> do fancy indexing. Is there a way to convert the list to something
>> that doesn't do fancy indexing? Either that, or I'd have to add
>> support for fancy indexing to larry.__getitem__. Something I'd like to
>> do but a big project.
>
> I forgot, I needed to check a few examples, using array with correctly
> broadcasted indices seems to work. I don't remember where/when we
> discussed it, but *rectangular* indexing shouldn't be very difficult.
>
> If this kind of examples  lar[np.array([0,1,2])[:,None],[1,3]] works
> correctly, the main work would be to add the None for the additional
> axes, looks doable with enough tests.
>

similar to this
>>> lar.x[np.ix_([0,1,2],[1,3])]
array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

>>> help(np.ix_)
Help on function ix_ in module numpy.lib.index_tricks:

ix_(*args)
    Construct an open mesh from multiple sequences.

    This function takes N 1-D sequences and returns N outputs with N
    dimensions each, such that the shape is 1 in all but one dimension
    and the dimension with the non-unit shape value cycles through all
    N dimensions.

    Using `ix_` one can quickly construct index arrays that will index
    the cross product. ``a[np.ix_([1,3],[2,5])]`` returns the array
    ``[[a[1,2] a[1,5]], [a[3,2] a[3,5]]]``.


the difference to np.ix_  would be that lix works only along one axes,
and so, doesn't directly resolve the issue of fancy versus rectangular
indexing when mixing lix and array slices.

I think its feasible, and the basic rectangular slicing doesn't look
too difficult, but mixing lix and fancy indexing might require more
checking what the desired outcome should be.

(a comment: some of the results with fancy indexing are very confusing
and I avoid these cases, and replicating the full fancy indexing
feature of numpy looks very difficult as the recent thread on the
numpy mailing list shows)

Josef




>>>> lar = la.larry(np.ones((3,4)))
>>>> lar[[0,1,2],[1,3]]
> Traceback (most recent call last):
>  File "C:\Josef\eclipsegworkspace\larry-josef\larry-josef\la\deflarry.py",
> line 1384, in __getitem__
>    x = self.x[index]
> ValueError: shape mismatch: objects cannot be broadcast to a single shape
>>>> lar.x[[0,1,2],[1,3]]
> Traceback (most recent call last):
> ValueError: shape mismatch: objects cannot be broadcast to a single shape
>>>> lar[np.array([0,1,2])[:,None],[1,3]]
> label_0
>    0
>    1
>    2
> label_1
>    1
>    3
> x
> array([[ 1.,  1.],
>       [ 1.,  1.],
>       [ 1.,  1.]])
>>>> lar[[0,1,2],[1,3,0]]
> Traceback (most recent call last):
>  File "C:\Josef\eclipsegworkspace\larry-josef\larry-josef\la\deflarry.py",
> line 1411, in __getitem__
>    return larry(x, label)
>  File "C:\Josef\eclipsegworkspace\larry-josef\larry-josef\la\deflarry.py",
> line 85, in __init__
>    if x.shape[i] != nlabel:
> IndexError: tuple index out of range
>

References

A new proposal for indexing with labels
From: Keith Goodman, 2010-02-07
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08