larry-discuss team mailing list archive
-
larry-discuss team
-
Mailing list archive
-
Message #00091
Re: A new proposal for indexing with labels
On Sun, Feb 7, 2010 at 11:32 PM, <josef.pktd@xxxxxxxxx> wrote:
> On Sun, Feb 7, 2010 at 11:17 PM, <josef.pktd@xxxxxxxxx> wrote:
>> On Sun, Feb 7, 2010 at 11:15 PM, <josef.pktd@xxxxxxxxx> wrote:
>>> On Sun, Feb 7, 2010 at 11:08 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>> On Sun, Feb 7, 2010 at 8:06 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>> On Sun, Feb 7, 2010 at 7:53 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>> On Sun, Feb 7, 2010 at 10:43 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>> On Sun, Feb 7, 2010 at 7:40 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>> On Sun, Feb 7, 2010 at 10:23 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>> On Sun, Feb 7, 2010 at 7:11 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>>> If you want to go this way, you could have lixs : lix with strings,
>>>>>>>>>> and lixr : lix with repr , ... ?
>>>>>>>>>> lix would be safe because it uses the labels directly, the other ones
>>>>>>>>>> are only recommended in restricted cases where the string (or other)
>>>>>>>>>> representation makes sense.
>>>>>>>>>
>>>>>>>>> Too confusing with two I think.
>>>>>>>>>
>>>>>>>>>> if you want to allow slicing i.e. `:` then I think lix would need to
>>>>>>>>>> use its __getitem__ instead of __call__ or __init__
>>>>>>>>>> lar1[1:4, lix['msft', google], lixs['2010-02-01':] or lar1[1:4,
>>>>>>>>>> lix[['msft', google]], lixs['2010-02-01':] ?
>>>>>>>>>
>>>>>>>>> lar1[lix(date)]: will work. So will lar[lix(date1):lix(date2)]
>>>>>>>>>
>>>>>>>>> Allowing slices slows things down because now we have to look instead
>>>>>>>>> each slice object (slice.start, slice.stop) for lix object and if
>>>>>>>>> found convert them.
>>>>>>>>
>>>>>>>> I wouldn't mix lix and regular array slices or indices for the same
>>>>>>>> axis. I would restrict it to if either one of (slice.start,
>>>>>>>> slice.stop) are lix then both are interpreted as labels.
>>>>>>>>
>>>>>>>> either lar[lix[date1:date2],:] or lar[lix(date1):lix(date2), :]
>>>>>>>> but not lar[lix(date1):-3]
>>>>>>>
>>>>>>> How come? Seems handy to me.
>>>>>>
>>>>>> Just a feeling, mixing oranges and apples, (If I have one label, I
>>>>>> expect also to have the other, or I have neither in the other case)
>>>>>>
>>>>>> It would be useful to write a babyclass with the different versions of
>>>>>> getitem. It's easier to see what's going on and to experiment than
>>>>>> using the already more complicated getitem of larry.
>>>>>>
>>>>>> Josef
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I don't know if lix should hold some code for conversion from label
>>>>>>>>>> representation to indices, or be just an identifier for use in
>>>>>>>>>> larry.__getitem__
>>>>>>>>>
>>>>>>>>> Yeah, a method getindex that takes label as input, perhaps. larry's
>>>>>>>>> getitem is already getting harry, so keeping the code in lix instead
>>>>>>>>> of adding yet more code to getitem might be a good idea. Or make a
>>>>>>>>> function in util.misc like a did for string indexing.
>>>>>
>>>>> I'm thinking of allowing only one label element in lix. But allow
>>>>> mixing label and integers for slicing.
>>>>>
>>>>> I tried allowing multiple labels per axis in the first indexing by
>>>>> labels blueprint. It's hard. So as a first step, only allow one. For
>>>>> example:
>>>>>
>>>>> lar[lix(date1):lix(date2)]
>>>>> lar[:lix(date2)]
>>>>> lar[lix(date1):-1]
>>>>
>>>> BTW, all this can already be done of course:
>>>>
>>>> lar[lar.labelindex(date1,0):lar.labelindex(date2,0)]
>>>> lar[:lar.labelindex(date2,0)]
>>>> lar[lar.labelindex(date1,0):-1]
>>>
>>> the usecase I had initially in mind was a list of labels
>>> lar1[:, ['msft', 'google', 'f2', 'f3'], ['open', 'close']].diff(3).log().diff(0)
>>
>> although it doesn't make economic sense
>>
>>>
>>> labelindex allows only single label
>>>
>>> Josef
>>>
>
> something like this would do what I had in mind:
>
> class lix2(object):
> def __init__(self, label):
> if type(label) != list:
> raise TypeError, 'label must be a list'
> self.label = label
>
> class A(object):
>
> def __init__(self, data, label):
> if type(label) != list:
> raise TypeError, 'label must be a list'
> self.label = label
> self.data = data
>
> def __getitem__(self, ind):
> if isinstance(ind, lix2):
> idx = map(self.label.index, ind.label)
> return self.data[idx]
>
> aa = A(np.arange(10), 'a b c d e f g h i j'.split())
> print aa[lix2(['a', 'b', 'i'])]
>
>>>> aa.label
> ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
>>>> aa.data
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>> aa[lix2(['a', 'b', 'i'])]
> array([0, 1, 8])
>
some examples with my latest version with lix3, see attachment
>>> aa = B(np.arange(10), 'a b c d e f g h i j'.split())
>>> print aa[lix2('a', 'b', 'i')]
[0 1 8]
>>> print aa[lix3('a', 'b', 'i')]
[0 1 8]
>>> print aa[lix3['a':'d']]
[0 1 2]
>>> print aa[lix3['a':'h']]
[0 1 2 3 4 5 6]
>>> print aa[lix3['c':'h']]
[2 3 4 5 6]
>>> print aa[lix3['c':-2]]
[2 3 4 5 6 7]
>>> print aa[lix3[2:'e']]
[2 3]
>>> print aa[lix3['c':'h':2]]
[2 4 6]
>>> print aa[lix3['c':]]
[2 3 4 5 6 7 8 9]
>>> print aa[lix3[:'c']]
[0 1]
>>>
Josef
# -*- coding: utf-8 -*-
"""
Created on Sun Feb 07 22:57:37 2010
Author: josef-pktd
"""
import numpy as np
class lix(object):
def __init__(self, label):
if type(label) != list:
raise TypeError, 'label must be a list'
self.label = label
def index(self, label):
return map(label.index, self.label)
def __repr__(self):
return 'larry label index object: \n' + str(self.label)
idx = lix(['a', 'b'])
label = ['a', 'b', 'c']
print idx.index(label)
class lix2(object):
def __init__(self, *label):
# if type(label) != list:
# raise TypeError, 'label must be a list'
self.label = label
class Lix3(object):
def __call__(self, *label):
# if type(label) != list:
# raise TypeError, 'label must be a list'
self.label = label
self.isslice = False
return self
def __getitem__(self, sliceind):
self.isslice = True
self.label = sliceind
return self
lix3 = Lix3()
class A(object):
def __init__(self, data, label):
if type(label) != list:
raise TypeError, 'label must be a list'
self.label = label
self.data = data
def __getitem__(self, ind):
if isinstance(ind, lix2):
idx = map(self.label.index, ind.label)
return self.data[idx]
aa = A(np.arange(10), 'a b c d e f g h i j'.split())
print aa[lix2('a', 'b', 'i')]
class B(object):
def __init__(self, data, label):
if type(label) != list:
raise TypeError, 'label must be a list'
self.label = label
self.data = data
def __getitem__(self, ind):
#print 'ind', ind
if isinstance(ind, lix2):
idx = map(self.label.index, ind.label)
return self.data[idx]
if isinstance(ind, Lix3):
if not lix3.isslice:
idx = map(self.label.index, ind.label)
return self.data[idx]
else:
try:
idxstart = self.label.index(ind.label.start)
except: #???
idxstart = ind.label.start
try:
idxstop = self.label.index(ind.label.stop)
except: #???
idxstop = ind.label.stop
idx = slice(idxstart, idxstop, ind.label.step)
#print 'slice idx', idx
return self.data[idx]
aa = B(np.arange(10), 'a b c d e f g h i j'.split())
print aa[lix2('a', 'b', 'i')]
print aa[lix3('a', 'b', 'i')]
print aa[lix3['a':'d']]
print aa[lix3['a':'h']]
print aa[lix3['c':'h']]
print aa[lix3['c':-2]]
print aa[lix3[2:'e']]
print aa[lix3['c':'h':2]]
print aa[lix3['c':]]
print aa[lix3[:'c']]
Follow ups
References
-
A new proposal for indexing with labels
From: Keith Goodman, 2010-02-07
-
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
-
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
-
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
-
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
-
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
-
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
-
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
-
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
-
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08