← Back to team overview

larry-discuss team mailing list archive

Re: A new proposal for indexing with labels

 

On Sat, Feb 6, 2010 at 5:38 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
> In a blueprint titled "index-by-label" I proposed a way to index
> larrys by lists of label elements. Here's a simpler, but less
> versatile, proposal. On the whole, due to its simplicity, I think it
> is more powerful.
>
> You can index into larrys just like you index into numpy arrays. To
> index into numpy arrays you can use integer, slices, etc. But you
> can't use strings. Strings have no meaning in the context of indexing.
> Therefore we are free to assign a special meaning to strings when used
> for indexing into a larry.
>
> My proposal is to interpret strings as label elements. So for example:
>
>>> y = la.larry([1,2,3], [['a', 'b', 4]])
>
>>> y['a']
>   1
>
>>> y['b':]
> label_0
>    b
>    4
> x
> array([2, 3])
>
>>> y['4']
>   3
>
> Note the last example above. We indexed with the string '4'. But there
> is no string '4' in the label, there is only the integer 4. The
> algorithm first looks for a string '4' in the label; if not found,
> then it maps the label to strings and looks again.
>
> I think it is quite powerful. It does add some overhead to non-string
> indexing, but not much. The biggest overhead is checking if slice
> objects have strings in them. For indexing with one integer (y[5]),
> for example, there is no overhead.
>
> Here are some more examples:
>
>>> from la import larry
>>> import numpy as np
>>> import datetime
>>> d = datetime.date
>>>
>>> x = np.arange(24).reshape(2,3,4)
>>> label = [['price', 'volume'], ['aapl', 'ibm', 'dell'], [d(2009,1,1), d(2009,1,2), d(2009,1,3), d(2009,1,4)]]
>>> y = larry(x, label)
>
>
>>> y['price']
> label_0
>    aapl
>    ibm
>    dell
> label_1
>    2009-01-01
>    2009-01-02
>    2009-01-03
>    2009-01-04
> x
> array([[ 0,  1,  2,  3],
>       [ 4,  5,  6,  7],
>       [ 8,  9, 10, 11]])
>
>
>>> y['price', 'aapl']
> label_0
>    2009-01-01
>    2009-01-02
>    2009-01-03
>    2009-01-04
> x
> array([0, 1, 2, 3])
>
>
>>> y['price', 'aapl':]
> label_0
>    aapl
>    ibm
>    dell
> label_1
>    2009-01-01
>    2009-01-02
>    2009-01-03
>    2009-01-04
> x
> array([[ 0,  1,  2,  3],
>       [ 4,  5,  6,  7],
>       [ 8,  9, 10, 11]])
>
>
>>> y['price', 'aapl', '2009-01-02']
>   1
>
>
>>> y['price', 'dell', '2009-01-02']
>   9


And, of course, you can do:

>> y['price']['dell']['2009-01-02']
   9


>
>>> y[:, 'dell', :]
> label_0
>    price
>    volume
> label_1
>    2009-01-01
>    2009-01-02
>    2009-01-03
>    2009-01-04
> x
> array([[ 8,  9, 10, 11],
>       [20, 21, 22, 23]])
>
>
>>> y[0, 'ibm', 2]
>   6
>
>
>>> y[0, 'ibm', :]
>
> label_0
>    2009-01-01
>    2009-01-02
>    2009-01-03
>    2009-01-04
> x
> array([4, 5, 6, 7])
>



References