larry-discuss team mailing list archive
-
larry-discuss team
-
Mailing list archive
-
Message #00099
Re: A new proposal for indexing with labels
On Mon, Feb 8, 2010 at 6:52 AM, <josef.pktd@xxxxxxxxx> wrote:
> On Mon, Feb 8, 2010 at 9:42 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>> On Mon, Feb 8, 2010 at 6:39 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>> On Sun, Feb 7, 2010 at 10:15 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>> On Mon, Feb 8, 2010 at 12:51 AM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>> On Sun, Feb 7, 2010 at 9:03 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>> On Sun, Feb 7, 2010 at 11:32 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>>> On Sun, Feb 7, 2010 at 11:17 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>> On Sun, Feb 7, 2010 at 11:15 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>> On Sun, Feb 7, 2010 at 11:08 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>> On Sun, Feb 7, 2010 at 8:06 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>>> On Sun, Feb 7, 2010 at 7:53 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>>>>> On Sun, Feb 7, 2010 at 10:43 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>>>>> On Sun, Feb 7, 2010 at 7:40 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>>>>>>> On Sun, Feb 7, 2010 at 10:23 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>> On Sun, Feb 7, 2010 at 7:11 PM, <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If you want to go this way, you could have lixs : lix with strings,
>>>>>>>>>>>>>>>> and lixr : lix with repr , ... ?
>>>>>>>>>>>>>>>> lix would be safe because it uses the labels directly, the other ones
>>>>>>>>>>>>>>>> are only recommended in restricted cases where the string (or other)
>>>>>>>>>>>>>>>> representation makes sense.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Too confusing with two I think.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> if you want to allow slicing i.e. `:` then I think lix would need to
>>>>>>>>>>>>>>>> use its __getitem__ instead of __call__ or __init__
>>>>>>>>>>>>>>>> lar1[1:4, lix['msft', google], lixs['2010-02-01':] or lar1[1:4,
>>>>>>>>>>>>>>>> lix[['msft', google]], lixs['2010-02-01':] ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> lar1[lix(date)]: will work. So will lar[lix(date1):lix(date2)]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Allowing slices slows things down because now we have to look instead
>>>>>>>>>>>>>>> each slice object (slice.start, slice.stop) for lix object and if
>>>>>>>>>>>>>>> found convert them.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I wouldn't mix lix and regular array slices or indices for the same
>>>>>>>>>>>>>> axis. I would restrict it to if either one of (slice.start,
>>>>>>>>>>>>>> slice.stop) are lix then both are interpreted as labels.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> either lar[lix[date1:date2],:] or lar[lix(date1):lix(date2), :]
>>>>>>>>>>>>>> but not lar[lix(date1):-3]
>>>>>>>>>>>>>
>>>>>>>>>>>>> How come? Seems handy to me.
>>>>>>>>>>>>
>>>>>>>>>>>> Just a feeling, mixing oranges and apples, (If I have one label, I
>>>>>>>>>>>> expect also to have the other, or I have neither in the other case)
>>>>>>>>>>>>
>>>>>>>>>>>> It would be useful to write a babyclass with the different versions of
>>>>>>>>>>>> getitem. It's easier to see what's going on and to experiment than
>>>>>>>>>>>> using the already more complicated getitem of larry.
>>>>>>>>>>>>
>>>>>>>>>>>> Josef
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don't know if lix should hold some code for conversion from label
>>>>>>>>>>>>>>>> representation to indices, or be just an identifier for use in
>>>>>>>>>>>>>>>> larry.__getitem__
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yeah, a method getindex that takes label as input, perhaps. larry's
>>>>>>>>>>>>>>> getitem is already getting harry, so keeping the code in lix instead
>>>>>>>>>>>>>>> of adding yet more code to getitem might be a good idea. Or make a
>>>>>>>>>>>>>>> function in util.misc like a did for string indexing.
>>>>>>>>>>>
>>>>>>>>>>> I'm thinking of allowing only one label element in lix. But allow
>>>>>>>>>>> mixing label and integers for slicing.
>>>>>>>>>>>
>>>>>>>>>>> I tried allowing multiple labels per axis in the first indexing by
>>>>>>>>>>> labels blueprint. It's hard. So as a first step, only allow one. For
>>>>>>>>>>> example:
>>>>>>>>>>>
>>>>>>>>>>> lar[lix(date1):lix(date2)]
>>>>>>>>>>> lar[:lix(date2)]
>>>>>>>>>>> lar[lix(date1):-1]
>>>>>>>>>>
>>>>>>>>>> BTW, all this can already be done of course:
>>>>>>>>>>
>>>>>>>>>> lar[lar.labelindex(date1,0):lar.labelindex(date2,0)]
>>>>>>>>>> lar[:lar.labelindex(date2,0)]
>>>>>>>>>> lar[lar.labelindex(date1,0):-1]
>>>>>>>>>
>>>>>>>>> the usecase I had initially in mind was a list of labels
>>>>>>>>> lar1[:, ['msft', 'google', 'f2', 'f3'], ['open', 'close']].diff(3).log().diff(0)
>>>>>>>>
>>>>>>>> although it doesn't make economic sense
>>>>>>>>
>>>>>>>>>
>>>>>>>>> labelindex allows only single label
>>>>>>>>>
>>>>>>>>> Josef
>>>>>>>>>
>>>>>>>
>>>>>>> something like this would do what I had in mind:
>>>>>>>
>>>>>>> class lix2(object):
>>>>>>> def __init__(self, label):
>>>>>>> if type(label) != list:
>>>>>>> raise TypeError, 'label must be a list'
>>>>>>> self.label = label
>>>>>>>
>>>>>>> class A(object):
>>>>>>>
>>>>>>> def __init__(self, data, label):
>>>>>>> if type(label) != list:
>>>>>>> raise TypeError, 'label must be a list'
>>>>>>> self.label = label
>>>>>>> self.data = data
>>>>>>>
>>>>>>> def __getitem__(self, ind):
>>>>>>> if isinstance(ind, lix2):
>>>>>>> idx = map(self.label.index, ind.label)
>>>>>>> return self.data[idx]
>>>>>>>
>>>>>>> aa = A(np.arange(10), 'a b c d e f g h i j'.split())
>>>>>>> print aa[lix2(['a', 'b', 'i'])]
>>>>>>>
>>>>>>>>>> aa.label
>>>>>>> ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
>>>>>>>>>> aa.data
>>>>>>> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>>>>>>> aa[lix2(['a', 'b', 'i'])]
>>>>>>> array([0, 1, 8])
>>>>>>>
>>>>>>
>>>>>>
>>>>>> some examples with my latest version with lix3, see attachment
>>>>>>
>>>>>>>>> aa = B(np.arange(10), 'a b c d e f g h i j'.split())
>>>>>>>>> print aa[lix2('a', 'b', 'i')]
>>>>>> [0 1 8]
>>>>>>>>> print aa[lix3('a', 'b', 'i')]
>>>>>> [0 1 8]
>>>>>>>>> print aa[lix3['a':'d']]
>>>>>> [0 1 2]
>>>>>>>>> print aa[lix3['a':'h']]
>>>>>> [0 1 2 3 4 5 6]
>>>>>>>>> print aa[lix3['c':'h']]
>>>>>> [2 3 4 5 6]
>>>>>>>>> print aa[lix3['c':-2]]
>>>>>> [2 3 4 5 6 7]
>>>>>>>>> print aa[lix3[2:'e']]
>>>>>> [2 3]
>>>>>>>>> print aa[lix3['c':'h':2]]
>>>>>> [2 4 6]
>>>>>>>>> print aa[lix3['c':]]
>>>>>> [2 3 4 5 6 7 8 9]
>>>>>>>>> print aa[lix3[:'c']]
>>>>>> [0 1]
>>>>>>>>>
>>>>>
>>>>> The problem is that these lists, when used to index into lar.x, will
>>>>> do fancy indexing. Is there a way to convert the list to something
>>>>> that doesn't do fancy indexing? Either that, or I'd have to add
>>>>> support for fancy indexing to larry.__getitem__. Something I'd like to
>>>>> do but a big project.
>>>>
>>>> I forgot, I needed to check a few examples, using array with correctly
>>>> broadcasted indices seems to work. I don't remember where/when we
>>>> discussed it, but *rectangular* indexing shouldn't be very difficult.
>>>>
>>>> If this kind of examples lar[np.array([0,1,2])[:,None],[1,3]] works
>>>> correctly, the main work would be to add the None for the additional
>>>> axes, looks doable with enough tests.
>>>>
>>>>>>> lar = la.larry(np.ones((3,4)))
>>>>>>> lar[[0,1,2],[1,3]]
>>>> Traceback (most recent call last):
>>>> File "C:\Josef\eclipsegworkspace\larry-josef\larry-josef\la\deflarry.py",
>>>> line 1384, in __getitem__
>>>> x = self.x[index]
>>>> ValueError: shape mismatch: objects cannot be broadcast to a single shape
>>>>>>> lar.x[[0,1,2],[1,3]]
>>>> Traceback (most recent call last):
>>>> ValueError: shape mismatch: objects cannot be broadcast to a single shape
>>>>>>> lar[np.array([0,1,2])[:,None],[1,3]]
>>>> label_0
>>>> 0
>>>> 1
>>>> 2
>>>> label_1
>>>> 1
>>>> 3
>>>> x
>>>> array([[ 1., 1.],
>>>> [ 1., 1.],
>>>> [ 1., 1.]])
>>>>>>> lar[[0,1,2],[1,3,0]]
>>>> Traceback (most recent call last):
>>>> File "C:\Josef\eclipsegworkspace\larry-josef\larry-josef\la\deflarry.py",
>>>> line 1411, in __getitem__
>>>> return larry(x, label)
>>>> File "C:\Josef\eclipsegworkspace\larry-josef\larry-josef\la\deflarry.py",
>>>> line 85, in __init__
>>>> if x.shape[i] != nlabel:
>>>> IndexError: tuple index out of range
>>>
>>> I don't think I want lar[index] to give different results from
>>> lar.x[index]. My goal is for larry and arrays to behave the same way
>>> where feasible. But I could borrow an ideas from the first blueprint:
>>>
>>> lar.lix[index]
>
> Yes, I agree this would be the cleanest and least confusing. That's
> how I thought about it initially.
>
>>>
>>> where index would only contain labels. Inside the lix method I would
>>> convert the labels to indices and then return lar[index_converted]. It
>>> would have the advantage of not slowing down an already slow
>>> larry.__getitem__. And the user would not have to wrap labels in a
>>> class. One downside is that you cannot do
>>>
>>> lar['price']['aapl'][date]
>>>
>>> instead it would be
>>>
>>> lar.lix['price'].lix['appl'].lix['date']
>>>
>>> Or
>>>
>
>
>>> lar['price', 'aapl', date]
>
> this would still require label handling inside getitem (?)
> did you mean
> lar.lix['price', 'aapl', date]
Yes, a typo. I'll send out a summary of the plan in new thread later
today. Good thing you know all the indexing tricks---makes this
possible.
>
>
>
>>
>> Oh, and I forgot one of the main point, lix would use rectangular
>> indexing. So it would convert multiple lists to rectangular indexing
>> before passing index_converted to larry.__getitem__.
>>
>
References
-
A new proposal for indexing with labels
From: Keith Goodman, 2010-02-07
-
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
-
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
-
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
-
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
-
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
-
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
-
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
-
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
-
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08