← Back to team overview

larry-discuss team mailing list archive

Re: A new proposal for indexing with labels

 

On Sun, Feb 7, 2010 at 6:58 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
> On Sun, Feb 7, 2010 at 6:55 PM,  <josef.pktd@xxxxxxxxx> wrote:
>> On Sun, Feb 7, 2010 at 9:46 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>> On Sun, Feb 7, 2010 at 6:43 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>> On Sun, Feb 7, 2010 at 6:39 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>> On Sun, Feb 7, 2010 at 6:35 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>> On Sun, Feb 7, 2010 at 6:28 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>> On Sun, Feb 7, 2010 at 8:46 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>> On Sun, Feb 7, 2010 at 5:26 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>> On Sat, Feb 6, 2010 at 6:53 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>>> On Sat, Feb 6, 2010 at 9:48 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>>> On Sat, Feb 6, 2010 at 5:38 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>>>> In a blueprint titled "index-by-label" I proposed a way to index
>>>>>>>>>>>> larrys by lists of label elements. Here's a simpler, but less
>>>>>>>>>>>> versatile, proposal. On the whole, due to its simplicity, I think it
>>>>>>>>>>>> is more powerful.
>>>>>>>>>>>
>>>>>>>>>>> I commit this proposal in r187. Please give it a try.
>>>>>>>>>>
>>>>>>>>>> I will try it tomorrow and look at the implementation.
>>>>>>>>>> My first reaction: very convenient but potentially fragile for arbitrary labels.
>>>>>>>>>
>>>>>>>>> The rule is simple for indexing with a string S:
>>>>>>>>>
>>>>>>>>> 1. Look for string S in the label. If found you are done. If not found...
>>>>>>>>> 2. Map the labels to strings and look again
>>>>>>>>>
>>>>>>>>> Although the rule is simple, the result can be unexpected in corner
>>>>>>>>> cases. For example, you may try to index with str(1) to access the
>>>>>>>>> label integer 1 but the label could also contain string '1'. So in
>>>>>>>>> that case you'd get an unexpected result even though the rule is
>>>>>>>>> simple.
>>>>>>>>>
>>>>>>>>> I could add a check: len(set(strlabel)) == len(set(label)). And raise
>>>>>>>>> an IndexError (or is that ValueError?) if they are not equal. That
>>>>>>>>> will slow things down but only for indexing by strings.
>>>>>>>>>
>>>>>>>>> Would that address your fragile comment? Or do you have something else in mind?
>>>>>>>>
>>>>>>>> Wait, that's being too restrictive. We don't care if there are
>>>>>>>> duplicates in strlabel. We only care if S appears more than once in
>>>>>>>> strlabel. For example, if we are indexing with str(1) and the label is
>>>>>>>> [2, str(2), 1], then we don't care that strlabel = [str(2), str(2),
>>>>>>>> str(1)] has duplicates; we only care that str(1) only appears once. If
>>>>>>>> we were indexing with str(2), on the other hand, then there would be a
>>>>>>>> problem and we'd raise a ValueError.
>>>>>>>>
>>>>>>>> I can add that check and then you can take a look.
>>>>>>>>
>>>>>>>
>>>>>>> I just started to look at it. I saw in str2labelindex  you use
>>>>>>> str(labelobject) to identify the label.
>>>>>>> I don't think __string__ is very save to use in general, I don't think
>>>>>>> it is guaranteed to remain unchanged. e.g. in numpy you can affect the
>>>>>>> str result with the print options for numbers in arrays, e.g.
>>>>>>> np.set_printoptions(precision=2).
>>>>>>>
>>>>>>> another example objects that don't define a unique string or use a
>>>>>>> default string
>>>>>>>>>> class MyA(object):pass
>>>>>>>
>>>>>>>>>> aaa = MyA()
>>>>>>>>>> str(aaa)
>>>>>>> '<__main__.MyA object at 0x01A57DD0>'
>>>>>>>
>>>>>>> I'm not very familiar with datetime, Is the string representation
>>>>>>> locale or timezone dependent ?
>>>>>>> decimal point is local dependent from some messages on the mailing
>>>>>>> lists, I assume that in some cases the default in german is 5,4
>>>>>>> instead of 5.4
>>>>>>>
>>>>>>> So, relying on the string representation imposes quite a lot of
>>>>>>> restrictions for which type of labels this would work.
>>>>>>>
>>>>>>> I look some more.
>>>>>>
>>>>>> Sure, indexing with things like '(3,4)' will be a problem since
>>>>>> str((3,4)) is '(3, 4)' (note the space). So the safe way to index is,
>>>>>> for example, y[str(1)].
>>>>>>
>>>>>> I like the general idea of using __getitem__ to index both the regular
>>>>>> and the label way. One thing I am wondering about is if there is
>>>>>> another way to signify indexing by labels other than with strings. It
>>>>>> would have to be something that numpy arrays can't be indexed by.
>>>>>
>>>>> I suppose dictionaries could be used. It does take quite a bit more
>>>>> typing. For example:
>>>>>
>>>>>>> class eli(object):
>>>>>   ...:     def __init__(self):
>>>>>   ...:         pass
>>>>>   ...:     def __getitem__(self, index):
>>>>>   ...:         print index
>>>>>   ...:
>>>>>   ...:
>>>>>
>>>>>>> e[{'label': 'a'},:]
>>>>> ({'label': 'a'}, slice(None, None, None))
>>>>>
>>>>> On the plus side: no need to map labels to strings.
>>>>
>>>> Or any two element sequence where the first element is 'label:
>>>>
>>>>>> e = eli()
>>>>
>>>>>> e[('label', 'a'), :]
>>>> (('label', 'a'), slice(None, None, None))
>>>>
>>>>>> e[['label', 'a'], :]
>>>> (['label', 'a'], slice(None, None, None))
>>>
>>> Or:
>>>
>>>>> from la import ix
>>>>> e[ix('a'), :]
>>
>> You got to this while I was writing my reply to an earlier message. If
>> you agree with this version, we can look at it more closely, I think
>> it's the safest bet.
>
> Yes, I like it. I also ended up using the name lix. Now, one problem.
> One of the features I like about string indexing is that you can do
> slices like this:
>
> y['2010-02-01':]
>
> So let's think on that.

That's easy:

e[lix('a'):, :]

But if we allow more than one label element, as you suggested, then
we'd raise an error on a slice with more than one element.

And with more than one element in labels I'll just always do regular
indexing, not fancy indexing. I hope that is not confusing.

OK, I'll implement tomorrow.



References