larry-discuss team mailing list archive

Thread
Date

Re: A new proposal for indexing with labels

To: josef.pktd@xxxxxxxxx
From: Keith Goodman <kwgoodman@xxxxxxxxx>
Date: Sun, 7 Feb 2010 18:39:56 -0800
Cc: larry-discuss@xxxxxxxxxxxxxxxxxxx
In-reply-to: <f4f93d421002071835m5ba687br2a3789e7bbd948e@mail.gmail.com>

On Sun, Feb 7, 2010 at 6:35 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
> On Sun, Feb 7, 2010 at 6:28 PM,  <josef.pktd@xxxxxxxxx> wrote:
>> On Sun, Feb 7, 2010 at 8:46 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>> On Sun, Feb 7, 2010 at 5:26 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>> On Sat, Feb 6, 2010 at 6:53 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>> On Sat, Feb 6, 2010 at 9:48 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>> On Sat, Feb 6, 2010 at 5:38 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>> In a blueprint titled "index-by-label" I proposed a way to index
>>>>>>> larrys by lists of label elements. Here's a simpler, but less
>>>>>>> versatile, proposal. On the whole, due to its simplicity, I think it
>>>>>>> is more powerful.
>>>>>>
>>>>>> I commit this proposal in r187. Please give it a try.
>>>>>
>>>>> I will try it tomorrow and look at the implementation.
>>>>> My first reaction: very convenient but potentially fragile for arbitrary labels.
>>>>
>>>> The rule is simple for indexing with a string S:
>>>>
>>>> 1. Look for string S in the label. If found you are done. If not found...
>>>> 2. Map the labels to strings and look again
>>>>
>>>> Although the rule is simple, the result can be unexpected in corner
>>>> cases. For example, you may try to index with str(1) to access the
>>>> label integer 1 but the label could also contain string '1'. So in
>>>> that case you'd get an unexpected result even though the rule is
>>>> simple.
>>>>
>>>> I could add a check: len(set(strlabel)) == len(set(label)). And raise
>>>> an IndexError (or is that ValueError?) if they are not equal. That
>>>> will slow things down but only for indexing by strings.
>>>>
>>>> Would that address your fragile comment? Or do you have something else in mind?
>>>
>>> Wait, that's being too restrictive. We don't care if there are
>>> duplicates in strlabel. We only care if S appears more than once in
>>> strlabel. For example, if we are indexing with str(1) and the label is
>>> [2, str(2), 1], then we don't care that strlabel = [str(2), str(2),
>>> str(1)] has duplicates; we only care that str(1) only appears once. If
>>> we were indexing with str(2), on the other hand, then there would be a
>>> problem and we'd raise a ValueError.
>>>
>>> I can add that check and then you can take a look.
>>>
>>
>> I just started to look at it. I saw in str2labelindex  you use
>> str(labelobject) to identify the label.
>> I don't think __string__ is very save to use in general, I don't think
>> it is guaranteed to remain unchanged. e.g. in numpy you can affect the
>> str result with the print options for numbers in arrays, e.g.
>> np.set_printoptions(precision=2).
>>
>> another example objects that don't define a unique string or use a
>> default string
>>>>> class MyA(object):pass
>>
>>>>> aaa = MyA()
>>>>> str(aaa)
>> '<__main__.MyA object at 0x01A57DD0>'
>>
>> I'm not very familiar with datetime, Is the string representation
>> locale or timezone dependent ?
>> decimal point is local dependent from some messages on the mailing
>> lists, I assume that in some cases the default in german is 5,4
>> instead of 5.4
>>
>> So, relying on the string representation imposes quite a lot of
>> restrictions for which type of labels this would work.
>>
>> I look some more.
>
> Sure, indexing with things like '(3,4)' will be a problem since
> str((3,4)) is '(3, 4)' (note the space). So the safe way to index is,
> for example, y[str(1)].
>
> I like the general idea of using __getitem__ to index both the regular
> and the label way. One thing I am wondering about is if there is
> another way to signify indexing by labels other than with strings. It
> would have to be something that numpy arrays can't be indexed by.

I suppose dictionaries could be used. It does take quite a bit more
typing. For example:

>> class eli(object):
   ...:     def __init__(self):
   ...:         pass
   ...:     def __getitem__(self, index):
   ...:         print index
   ...:
   ...:

>> e[{'label': 'a'},:]
({'label': 'a'}, slice(None, None, None))

On the plus side: no need to map labels to strings.

Follow ups

Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08

References

A new proposal for indexing with labels
From: Keith Goodman, 2010-02-07
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-07
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-07
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08