larry-discuss team mailing list archive

Thread
Date

Re: A new proposal for indexing with labels

To: Keith Goodman <kwgoodman@xxxxxxxxx>
From: josef.pktd@xxxxxxxxx
Date: Sun, 7 Feb 2010 22:11:10 -0500
Cc: larry-discuss@xxxxxxxxxxxxxxxxxxx
In-reply-to: <f4f93d421002071858g7f49631ej4f73b5d449d3da3c@mail.gmail.com>

On Sun, Feb 7, 2010 at 9:58 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
> On Sun, Feb 7, 2010 at 6:55 PM,  <josef.pktd@xxxxxxxxx> wrote:
>> On Sun, Feb 7, 2010 at 9:46 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>> On Sun, Feb 7, 2010 at 6:43 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>> On Sun, Feb 7, 2010 at 6:39 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>> On Sun, Feb 7, 2010 at 6:35 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>> On Sun, Feb 7, 2010 at 6:28 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>> On Sun, Feb 7, 2010 at 8:46 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>> On Sun, Feb 7, 2010 at 5:26 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>> On Sat, Feb 6, 2010 at 6:53 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>>>>>>>> On Sat, Feb 6, 2010 at 9:48 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>>> On Sat, Feb 6, 2010 at 5:38 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>>>>>>>>>>> In a blueprint titled "index-by-label" I proposed a way to index
>>>>>>>>>>>> larrys by lists of label elements. Here's a simpler, but less
>>>>>>>>>>>> versatile, proposal. On the whole, due to its simplicity, I think it
>>>>>>>>>>>> is more powerful.
>>>>>>>>>>>
>>>>>>>>>>> I commit this proposal in r187. Please give it a try.
>>>>>>>>>>
>>>>>>>>>> I will try it tomorrow and look at the implementation.
>>>>>>>>>> My first reaction: very convenient but potentially fragile for arbitrary labels.
>>>>>>>>>
>>>>>>>>> The rule is simple for indexing with a string S:
>>>>>>>>>
>>>>>>>>> 1. Look for string S in the label. If found you are done. If not found...
>>>>>>>>> 2. Map the labels to strings and look again
>>>>>>>>>
>>>>>>>>> Although the rule is simple, the result can be unexpected in corner
>>>>>>>>> cases. For example, you may try to index with str(1) to access the
>>>>>>>>> label integer 1 but the label could also contain string '1'. So in
>>>>>>>>> that case you'd get an unexpected result even though the rule is
>>>>>>>>> simple.
>>>>>>>>>
>>>>>>>>> I could add a check: len(set(strlabel)) == len(set(label)). And raise
>>>>>>>>> an IndexError (or is that ValueError?) if they are not equal. That
>>>>>>>>> will slow things down but only for indexing by strings.
>>>>>>>>>
>>>>>>>>> Would that address your fragile comment? Or do you have something else in mind?
>>>>>>>>
>>>>>>>> Wait, that's being too restrictive. We don't care if there are
>>>>>>>> duplicates in strlabel. We only care if S appears more than once in
>>>>>>>> strlabel. For example, if we are indexing with str(1) and the label is
>>>>>>>> [2, str(2), 1], then we don't care that strlabel = [str(2), str(2),
>>>>>>>> str(1)] has duplicates; we only care that str(1) only appears once. If
>>>>>>>> we were indexing with str(2), on the other hand, then there would be a
>>>>>>>> problem and we'd raise a ValueError.
>>>>>>>>
>>>>>>>> I can add that check and then you can take a look.
>>>>>>>>
>>>>>>>
>>>>>>> I just started to look at it. I saw in str2labelindex  you use
>>>>>>> str(labelobject) to identify the label.
>>>>>>> I don't think __string__ is very save to use in general, I don't think
>>>>>>> it is guaranteed to remain unchanged. e.g. in numpy you can affect the
>>>>>>> str result with the print options for numbers in arrays, e.g.
>>>>>>> np.set_printoptions(precision=2).
>>>>>>>
>>>>>>> another example objects that don't define a unique string or use a
>>>>>>> default string
>>>>>>>>>> class MyA(object):pass
>>>>>>>
>>>>>>>>>> aaa = MyA()
>>>>>>>>>> str(aaa)
>>>>>>> '<__main__.MyA object at 0x01A57DD0>'
>>>>>>>
>>>>>>> I'm not very familiar with datetime, Is the string representation
>>>>>>> locale or timezone dependent ?
>>>>>>> decimal point is local dependent from some messages on the mailing
>>>>>>> lists, I assume that in some cases the default in german is 5,4
>>>>>>> instead of 5.4
>>>>>>>
>>>>>>> So, relying on the string representation imposes quite a lot of
>>>>>>> restrictions for which type of labels this would work.
>>>>>>>
>>>>>>> I look some more.
>>>>>>
>>>>>> Sure, indexing with things like '(3,4)' will be a problem since
>>>>>> str((3,4)) is '(3, 4)' (note the space). So the safe way to index is,
>>>>>> for example, y[str(1)].
>>>>>>
>>>>>> I like the general idea of using __getitem__ to index both the regular
>>>>>> and the label way. One thing I am wondering about is if there is
>>>>>> another way to signify indexing by labels other than with strings. It
>>>>>> would have to be something that numpy arrays can't be indexed by.
>>>>>
>>>>> I suppose dictionaries could be used. It does take quite a bit more
>>>>> typing. For example:
>>>>>
>>>>>>> class eli(object):
>>>>>   ...:     def __init__(self):
>>>>>   ...:         pass
>>>>>   ...:     def __getitem__(self, index):
>>>>>   ...:         print index
>>>>>   ...:
>>>>>   ...:
>>>>>
>>>>>>> e[{'label': 'a'},:]
>>>>> ({'label': 'a'}, slice(None, None, None))
>>>>>
>>>>> On the plus side: no need to map labels to strings.
>>>>
>>>> Or any two element sequence where the first element is 'label:
>>>>
>>>>>> e = eli()
>>>>
>>>>>> e[('label', 'a'), :]
>>>> (('label', 'a'), slice(None, None, None))
>>>>
>>>>>> e[['label', 'a'], :]
>>>> (['label', 'a'], slice(None, None, None))
>>>
>>> Or:
>>>
>>>>> from la import ix
>>>>> e[ix('a'), :]
>>
>> You got to this while I was writing my reply to an earlier message. If
>> you agree with this version, we can look at it more closely, I think
>> it's the safest bet.
>
> Yes, I like it. I also ended up using the name lix. Now, one problem.
> One of the features I like about string indexing is that you can do
> slices like this:
>
> y['2010-02-01':]
>
> So let's think on that.

If you want to go this way, you could have lixs :  lix with strings,
and lixr : lix with repr , ... ?
lix would be safe  because it uses the labels directly, the other ones
are only recommended in restricted cases where the string (or other)
representation makes sense.

if you want to allow slicing i.e. `:`  then I think lix would need to
use its __getitem__ instead of __call__ or __init__
lar1[1:4, lix['msft', google], lixs['2010-02-01':]   or lar1[1:4,
lix[['msft', google]], lixs['2010-02-01':] ?

I don't know if lix should hold some code for conversion from label
representation to indices, or be just an identifier for use in
larry.__getitem__

Josef

Follow ups

Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08

References

A new proposal for indexing with labels
From: Keith Goodman, 2010-02-07
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08
Re: A new proposal for indexing with labels
From: josef . pktd, 2010-02-08
Re: A new proposal for indexing with labels
From: Keith Goodman, 2010-02-08