← Back to team overview

larry-discuss team mailing list archive

Re: A new proposal for indexing with labels

 

On Sun, Feb 7, 2010 at 9:03 PM,  <josef.pktd@xxxxxxxxx> wrote:
> On Sun, Feb 7, 2010 at 11:32 PM,  <josef.pktd@xxxxxxxxx> wrote:
>> On Sun, Feb 7, 2010 at 11:17 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>> On Sun, Feb 7, 2010 at 11:15 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>> On Sun, Feb 7, 2010 at 11:08 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>> On Sun, Feb 7, 2010 at 8:06 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>> On Sun, Feb 7, 2010 at 7:53 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>> On Sun, Feb 7, 2010 at 10:43 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>> On Sun, Feb 7, 2010 at 7:40 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>> On Sun, Feb 7, 2010 at 10:23 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>> On Sun, Feb 7, 2010 at 7:11 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>>> If you want to go this way, you could have lixs :  lix with strings,
>>>>>>>>>>> and lixr : lix with repr , ... ?
>>>>>>>>>>> lix would be safe  because it uses the labels directly, the other ones
>>>>>>>>>>> are only recommended in restricted cases where the string (or other)
>>>>>>>>>>> representation makes sense.
>>>>>>>>>>
>>>>>>>>>> Too confusing with two I think.
>>>>>>>>>>
>>>>>>>>>>> if you want to allow slicing i.e. `:`  then I think lix would need to
>>>>>>>>>>> use its __getitem__ instead of __call__ or __init__
>>>>>>>>>>> lar1[1:4, lix['msft', google], lixs['2010-02-01':]   or lar1[1:4,
>>>>>>>>>>> lix[['msft', google]], lixs['2010-02-01':] ?
>>>>>>>>>>
>>>>>>>>>> lar1[lix(date)]: will work. So will lar[lix(date1):lix(date2)]
>>>>>>>>>>
>>>>>>>>>> Allowing slices slows things down because now we have to look instead
>>>>>>>>>> each slice object (slice.start, slice.stop) for lix object and if
>>>>>>>>>> found convert them.
>>>>>>>>>
>>>>>>>>> I wouldn't mix lix and regular array slices or indices for the same
>>>>>>>>> axis. I would restrict it to if either one of (slice.start,
>>>>>>>>> slice.stop) are lix then both are interpreted as labels.
>>>>>>>>>
>>>>>>>>> either lar[lix[date1:date2],:]   or   lar[lix(date1):lix(date2), :]
>>>>>>>>> but not lar[lix(date1):-3]
>>>>>>>>
>>>>>>>> How come? Seems handy to me.
>>>>>>>
>>>>>>> Just a feeling, mixing oranges and apples, (If I have one label, I
>>>>>>> expect also to have the other, or I have neither in the other case)
>>>>>>>
>>>>>>> It would be useful to write a babyclass with the different versions of
>>>>>>> getitem. It's easier to see what's going on and to experiment than
>>>>>>> using the already more complicated getitem of larry.
>>>>>>>
>>>>>>> Josef
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I don't know if lix should hold some code for conversion from label
>>>>>>>>>>> representation to indices, or be just an identifier for use in
>>>>>>>>>>> larry.__getitem__
>>>>>>>>>>
>>>>>>>>>> Yeah, a method getindex that takes label as input, perhaps. larry's
>>>>>>>>>> getitem is already getting harry, so keeping the code in lix instead
>>>>>>>>>> of adding yet more code to getitem might be a good idea. Or make a
>>>>>>>>>> function in util.misc like a did for string indexing.
>>>>>>
>>>>>> I'm thinking of allowing only one label element in lix. But allow
>>>>>> mixing label and integers for slicing.
>>>>>>
>>>>>> I tried allowing multiple labels per axis in the first indexing by
>>>>>> labels blueprint. It's hard. So as a first step, only allow one. For
>>>>>> example:
>>>>>>
>>>>>> lar[lix(date1):lix(date2)]
>>>>>> lar[:lix(date2)]
>>>>>> lar[lix(date1):-1]
>>>>>
>>>>> BTW, all this can already be done of course:
>>>>>
>>>>> lar[lar.labelindex(date1,0):lar.labelindex(date2,0)]
>>>>> lar[:lar.labelindex(date2,0)]
>>>>> lar[lar.labelindex(date1,0):-1]
>>>>
>>>> the usecase I had initially in mind was a list of labels
>>>> lar1[:, ['msft', 'google', 'f2', 'f3'], ['open', 'close']].diff(3).log().diff(0)
>>>
>>> although it doesn't make economic sense
>>>
>>>>
>>>> labelindex allows only single label
>>>>
>>>> Josef
>>>>
>>
>> something like this would do what I had in mind:
>>
>> class lix2(object):
>>    def __init__(self, label):
>>        if type(label) != list:
>>            raise TypeError, 'label must be a list'
>>        self.label = label
>>
>> class A(object):
>>
>>    def __init__(self, data, label):
>>        if type(label) != list:
>>            raise TypeError, 'label must be a list'
>>        self.label = label
>>        self.data = data
>>
>>    def __getitem__(self, ind):
>>        if isinstance(ind, lix2):
>>            idx = map(self.label.index, ind.label)
>>            return self.data[idx]
>>
>> aa = A(np.arange(10), 'a b c d e f g h i j'.split())
>> print aa[lix2(['a', 'b', 'i'])]
>>
>>>>> aa.label
>> ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
>>>>> aa.data
>> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>> aa[lix2(['a', 'b', 'i'])]
>> array([0, 1, 8])
>>
>
>
> some examples with my latest version with lix3, see attachment
>
>>>> aa = B(np.arange(10), 'a b c d e f g h i j'.split())
>>>> print aa[lix2('a', 'b', 'i')]
> [0 1 8]
>>>> print aa[lix3('a', 'b', 'i')]
> [0 1 8]
>>>> print aa[lix3['a':'d']]
> [0 1 2]
>>>> print aa[lix3['a':'h']]
> [0 1 2 3 4 5 6]
>>>> print aa[lix3['c':'h']]
> [2 3 4 5 6]
>>>> print aa[lix3['c':-2]]
> [2 3 4 5 6 7]
>>>> print aa[lix3[2:'e']]
> [2 3]
>>>> print aa[lix3['c':'h':2]]
> [2 4 6]
>>>> print aa[lix3['c':]]
> [2 3 4 5 6 7 8 9]
>>>> print aa[lix3[:'c']]
> [0 1]
>>>>

The problem is that these lists, when used to index into lar.x, will
do fancy indexing. Is there a way to convert the list to something
that doesn't do fancy indexing? Either that, or I'd have to add
support for fancy indexing to larry.__getitem__. Something I'd like to
do but a big project.



Follow ups

References