larry-discuss team mailing list archive

Thread
Date

Re: A new proposal for indexing with labels

To: Keith Goodman <kwgoodman@xxxxxxxxx>
From: josef.pktd@xxxxxxxxx
Date: Sun, 7 Feb 2010 23:32:27 -0500
Cc: larry-discuss@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1cd32cbb1002072017y6322df68r6a41ae08d0f8a27a@mail.gmail.com>

On Sun, Feb 7, 2010 at 11:17 PM,  <josef.pktd@xxxxxxxxx> wrote:
> On Sun, Feb 7, 2010 at 11:15 PM,  <josef.pktd@xxxxxxxxx> wrote:
>> On Sun, Feb 7, 2010 at 11:08 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>> On Sun, Feb 7, 2010 at 8:06 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>> On Sun, Feb 7, 2010 at 7:53 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>> On Sun, Feb 7, 2010 at 10:43 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>> On Sun, Feb 7, 2010 at 7:40 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>> On Sun, Feb 7, 2010 at 10:23 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>> On Sun, Feb 7, 2010 at 7:11 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>>> If you want to go this way, you could have lixs :  lix with strings,
>>>>>>>>> and lixr : lix with repr , ... ?
>>>>>>>>> lix would be safe  because it uses the labels directly, the other ones
>>>>>>>>> are only recommended in restricted cases where the string (or other)
>>>>>>>>> representation makes sense.
>>>>>>>>
>>>>>>>> Too confusing with two I think.
>>>>>>>>
>>>>>>>>> if you want to allow slicing i.e. `:`  then I think lix would need to
>>>>>>>>> use its __getitem__ instead of __call__ or __init__
>>>>>>>>> lar1[1:4, lix['msft', google], lixs['2010-02-01':]   or lar1[1:4,
>>>>>>>>> lix[['msft', google]], lixs['2010-02-01':] ?
>>>>>>>>
>>>>>>>> lar1[lix(date)]: will work. So will lar[lix(date1):lix(date2)]
>>>>>>>>
>>>>>>>> Allowing slices slows things down because now we have to look instead
>>>>>>>> each slice object (slice.start, slice.stop) for lix object and if
>>>>>>>> found convert them.
>>>>>>>
>>>>>>> I wouldn't mix lix and regular array slices or indices for the same
>>>>>>> axis. I would restrict it to if either one of (slice.start,
>>>>>>> slice.stop) are lix then both are interpreted as labels.
>>>>>>>
>>>>>>> either lar[lix[date1:date2],:]   or   lar[lix(date1):lix(date2), :]
>>>>>>> but not lar[lix(date1):-3]
>>>>>>
>>>>>> How come? Seems handy to me.
>>>>>
>>>>> Just a feeling, mixing oranges and apples, (If I have one label, I
>>>>> expect also to have the other, or I have neither in the other case)
>>>>>
>>>>> It would be useful to write a babyclass with the different versions of
>>>>> getitem. It's easier to see what's going on and to experiment than
>>>>> using the already more complicated getitem of larry.
>>>>>
>>>>> Josef
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> I don't know if lix should hold some code for conversion from label
>>>>>>>>> representation to indices, or be just an identifier for use in
>>>>>>>>> larry.__getitem__
>>>>>>>>
>>>>>>>> Yeah, a method getindex that takes label as input, perhaps. larry's
>>>>>>>> getitem is already getting harry, so keeping the code in lix instead
>>>>>>>> of adding yet more code to getitem might be a good idea. Or make a
>>>>>>>> function in util.misc like a did for string indexing.
>>>>
>>>> I'm thinking of allowing only one label element in lix. But allow
>>>> mixing label and integers for slicing.
>>>>
>>>> I tried allowing multiple labels per axis in the first indexing by
>>>> labels blueprint. It's hard. So as a first step, only allow one. For
>>>> example:
>>>>
>>>> lar[lix(date1):lix(date2)]
>>>> lar[:lix(date2)]
>>>> lar[lix(date1):-1]
>>>
>>> BTW, all this can already be done of course:
>>>
>>> lar[lar.labelindex(date1,0):lar.labelindex(date2,0)]
>>> lar[:lar.labelindex(date2,0)]
>>> lar[lar.labelindex(date1,0):-1]
>>
>> the usecase I had initially in mind was a list of labels
>> lar1[:, ['msft', 'google', 'f2', 'f3'], ['open', 'close']].diff(3).log().diff(0)
>
> although it doesn't make economic sense
>
>>
>> labelindex allows only single label
>>
>> Josef
>>

something like this would do what I had in mind:

class lix2(object):
    def __init__(self, label):
        if type(label) != list:
            raise TypeError, 'label must be a list'
        self.label = label

class A(object):

    def __init__(self, data, label):
        if type(label) != list:
            raise TypeError, 'label must be a list'
        self.label = label
        self.data = data

    def __getitem__(self, ind):
        if isinstance(ind, lix2):
            idx = map(self.label.index, ind.label)
            return self.data[idx]

aa = A(np.arange(10), 'a b c d e f g h i j'.split())
print aa[lix2(['a', 'b', 'i'])]

>>> aa.label
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
>>> aa.data
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> aa[lix2(['a', 'b', 'i'])]
array([0, 1, 8])

Follow ups

Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08

References

A new proposal for indexing with labels
From: Keith Goodman, 2010-02-07
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08