← Back to team overview

larry-discuss team mailing list archive

Re: New features: totuples, fromtuples

 

On Sun, Jan 31, 2010 at 12:56 PM,  <josef.pktd@xxxxxxxxx> wrote:
> On Sun, Jan 31, 2010 at 3:44 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>> On Sun, Jan 31, 2010 at 12:38 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>> On Sun, Jan 31, 2010 at 3:11 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>> On Sun, Jan 31, 2010 at 12:05 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>> On Sun, Jan 31, 2010 at 2:57 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>> Record as in numpy record array?
>>>>>
>>>>> kind of, as in a row in a structured array, not really a recordarray,
>>>>> which add some candy that's not really worth the effort.
>>>>> tabular is based on structured arrays (they moved away from record
>>>>> arrays), and scikits timeseries torecords() produces structured arrays
>>>>> not record arrays.
>>>>
>>>> I don't know the format for a structured array. The first google hit
>>>> (scipy docs) says: "Structured Arrays (aka Record Arrays)"
>>>>
>>>> What would this look like in structured array format?
>>>>
>>>>>> y = la.larry([[1.0, 2.0], [3.0, 4.0]], [['a', 'b'], ['c', 'd']])
>>>>>> y
>>>> label_0
>>>>    a
>>>>    b
>>>> label_1
>>>>    c
>>>>    d
>>>> x
>>>> array([[ 1.,  2.],
>>>>       [ 3.,  4.]])
>>>>
>>>> Oops, we've gone off list.
>>>>
>>> back on list
>>>
>>> I think record arrays have gone a bit out of fashion in the last two
>>> years when I was following the mailing lists. Most discussion on the
>>> mailing list is on structured arrays, which have the same dtype
>>> structure as record arrays, but without the
>>> dotted access to columns
>>>
>>> here is an attempt not really general,
>>>
>>> y = la.larry([[1.0, 2.0], [3.0, 4.0]], [['a', 'b'], ['c', 'd']])
>>> ysr = np.empty(y.x.shape[0],dtype=([('index','S1')]+[(i,np.float) for
>>> i in y.label[1]]))
>>> ysr['index'] = y.label[0]
>>> for i in ysr.dtype.names[1:]:
>>>    ysr[i] = y[y.labelindex(i, axis=1)].x
>>>
>>>
>>>>>> ysr
>>> array([('a', 1.0, 3.0), ('b', 2.0, 4.0)],
>>>      dtype=[('index', '|S1'), ('c', '<f8'), ('d', '<f8')])
>>>>>> ysr.shape
>>> (2,)
>>>>>> ysr[0]
>>> ('a', 1.0, 3.0)
>>>>>> ysr[1]
>>> ('b', 2.0, 4.0)
>>>
>>> Adding the labels in the first column, makes it a bit more difficult,
>>> otherwise it would just be a view on y.x with a structured dtype.
>>>
>>> What is the best way to access a larry column by label name?
>>
>> If you only want to pull one row then you can use:
>>
>>>> y.pull('a', 0)
>>
>> label_0
>>    c
>>    d
>> x
>> array([ 1.,  2.])
>>
>> I have experimental support (not in trunk) for indexing with label names. So
>>
>> y.index[['a']]
>>
>> or
>>
>> y.index[['a'],:]
>
> Looks nice, access by label/index name as in pandas can be very
> convenient, at least for variable names. I'm not used much to
> accessing time periods by date.

The reason I didn't put committed it is that something like

y.index[['b', 'a'], ['c', 'd']]

is fancy indexing. And larry.__getitem__, which is called by
larry.index, does not yet support fancy indexing.

In the example above, fancy indexing returns a larry of shape (2,).



References