← Back to team overview

larry-discuss team mailing list archive

Re: A new proposal for indexing with labels

 

On Sun, Feb 7, 2010 at 6:28 PM,  <josef.pktd@xxxxxxxxx> wrote:
> On Sun, Feb 7, 2010 at 8:46 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>> On Sun, Feb 7, 2010 at 5:26 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>> On Sat, Feb 6, 2010 at 6:53 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>> On Sat, Feb 6, 2010 at 9:48 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>> On Sat, Feb 6, 2010 at 5:38 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>> In a blueprint titled "index-by-label" I proposed a way to index
>>>>>> larrys by lists of label elements. Here's a simpler, but less
>>>>>> versatile, proposal. On the whole, due to its simplicity, I think it
>>>>>> is more powerful.
>>>>>
>>>>> I commit this proposal in r187. Please give it a try.
>>>>
>>>> I will try it tomorrow and look at the implementation.
>>>> My first reaction: very convenient but potentially fragile for arbitrary labels.
>>>
>>> The rule is simple for indexing with a string S:
>>>
>>> 1. Look for string S in the label. If found you are done. If not found...
>>> 2. Map the labels to strings and look again
>>>
>>> Although the rule is simple, the result can be unexpected in corner
>>> cases. For example, you may try to index with str(1) to access the
>>> label integer 1 but the label could also contain string '1'. So in
>>> that case you'd get an unexpected result even though the rule is
>>> simple.
>>>
>>> I could add a check: len(set(strlabel)) == len(set(label)). And raise
>>> an IndexError (or is that ValueError?) if they are not equal. That
>>> will slow things down but only for indexing by strings.
>>>
>>> Would that address your fragile comment? Or do you have something else in mind?
>>
>> Wait, that's being too restrictive. We don't care if there are
>> duplicates in strlabel. We only care if S appears more than once in
>> strlabel. For example, if we are indexing with str(1) and the label is
>> [2, str(2), 1], then we don't care that strlabel = [str(2), str(2),
>> str(1)] has duplicates; we only care that str(1) only appears once. If
>> we were indexing with str(2), on the other hand, then there would be a
>> problem and we'd raise a ValueError.
>>
>> I can add that check and then you can take a look.
>>
>
> I just started to look at it. I saw in str2labelindex  you use
> str(labelobject) to identify the label.
> I don't think __string__ is very save to use in general, I don't think
> it is guaranteed to remain unchanged. e.g. in numpy you can affect the
> str result with the print options for numbers in arrays, e.g.
> np.set_printoptions(precision=2).
>
> another example objects that don't define a unique string or use a
> default string
>>>> class MyA(object):pass
>
>>>> aaa = MyA()
>>>> str(aaa)
> '<__main__.MyA object at 0x01A57DD0>'
>
> I'm not very familiar with datetime, Is the string representation
> locale or timezone dependent ?
> decimal point is local dependent from some messages on the mailing
> lists, I assume that in some cases the default in german is 5,4
> instead of 5.4
>
> So, relying on the string representation imposes quite a lot of
> restrictions for which type of labels this would work.
>
> I look some more.

Sure, indexing with things like '(3,4)' will be a problem since
str((3,4)) is '(3, 4)' (note the space). So the safe way to index is,
for example, y[str(1)].

I like the general idea of using __getitem__ to index both the regular
and the label way. One thing I am wondering about is if there is
another way to signify indexing by labels other than with strings. It
would have to be something that numpy arrays can't be indexed by.



Follow ups

References