larry-discuss team mailing list archive
-
larry-discuss team
-
Mailing list archive
-
Message #00064
A new proposal for indexing with labels
In a blueprint titled "index-by-label" I proposed a way to index
larrys by lists of label elements. Here's a simpler, but less
versatile, proposal. On the whole, due to its simplicity, I think it
is more powerful.
You can index into larrys just like you index into numpy arrays. To
index into numpy arrays you can use integer, slices, etc. But you
can't use strings. Strings have no meaning in the context of indexing.
Therefore we are free to assign a special meaning to strings when used
for indexing into a larry.
My proposal is to interpret strings as label elements. So for example:
>> y = la.larry([1,2,3], [['a', 'b', 4]])
>> y['a']
1
>> y['b':]
label_0
b
4
x
array([2, 3])
>> y['4']
3
Note the last example above. We indexed with the string '4'. But there
is no string '4' in the label, there is only the integer 4. The
algorithm first looks for a string '4' in the label; if not found,
then it maps the label to strings and looks again.
I think it is quite powerful. It does add some overhead to non-string
indexing, but not much. The biggest overhead is checking if slice
objects have strings in them. For indexing with one integer (y[5]),
for example, there is no overhead.
Here are some more examples:
>> from la import larry
>> import numpy as np
>> import datetime
>> d = datetime.date
>>
>> x = np.arange(24).reshape(2,3,4)
>> label = [['price', 'volume'], ['aapl', 'ibm', 'dell'], [d(2009,1,1), d(2009,1,2), d(2009,1,3), d(2009,1,4)]]
>> y = larry(x, label)
>> y['price']
label_0
aapl
ibm
dell
label_1
2009-01-01
2009-01-02
2009-01-03
2009-01-04
x
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>> y['price', 'aapl']
label_0
2009-01-01
2009-01-02
2009-01-03
2009-01-04
x
array([0, 1, 2, 3])
>> y['price', 'aapl':]
label_0
aapl
ibm
dell
label_1
2009-01-01
2009-01-02
2009-01-03
2009-01-04
x
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>> y['price', 'aapl', '2009-01-02']
1
>> y['price', 'dell', '2009-01-02']
9
>> y[:, 'dell', :]
label_0
price
volume
label_1
2009-01-01
2009-01-02
2009-01-03
2009-01-04
x
array([[ 8, 9, 10, 11],
[20, 21, 22, 23]])
>> y[0, 'ibm', 2]
6
>> y[0, 'ibm', :]
label_0
2009-01-01
2009-01-02
2009-01-03
2009-01-04
x
array([4, 5, 6, 7])
Follow ups