← Back to team overview

u1db-discuss team mailing list archive

Re: Indexing and lists

 

On Thu, 17 Nov 2011 20:06:29 +0000, Stuart Langridge <stuart.langridge@xxxxxxxxxxxxx> wrote:
> 
> how would I do an index on people who have a work phone number?
> create_index("worknums", [ "phones.name" ]) ? That feels weird; the
> indexer would act differently depending on whether the value of "phones"
> is a dict or a list of dicts. Then again, maybe that's the answer; if a
> part of an index expression resolves to a list, then we do the remainder
> of the index expression for *each item in the list*. This would also
> cope with the above colours example, ignoring my reservations about it
> feeling weird. To me that makes a certain amount of sense. Thoughts?

More questions: do you want to be able to create an index on the names
of an object? Do we want partial indexes? If we have an index expression
that transforms a string into a list of strings, do we need to
explicitly say that we want each of those added separately to the index,
rather than the list itself?

I think the answer to those is no, yes, and no: I think the rule for
index expressions should be that they either resolve to a single
"scalar" value (one of string|number|true|false|null), which is added to
the index, or to a list, which scalar elements are added sequentially to
the index, and that if neither of those happens it's not an error, it
simply isn't added (I'm on the fence as to whether lists that have list
elements should have the elements of the list elemenet added
recursively; having to explain that makes my head hurt a little. man
perllol). That we should provide no index functions to address
individual items of a list; if you need to treat the second item
differently from the first, then it should be an object, not a
list. that "name.split().lower()" (or "lower(split(name))", or
name|split|lower, or whatever) should result in the same values added to
the index as "name.lower().split()". And that we should continue to
enforce the semantics (in the same way i said "you shouldn't care about
the nth element of the list") by saying that you shouldn't get into the
situation where you have to create an index on the keys.

I also think that after describing what we want for the indexing
language, we need to look at what is the minimal thing we can do that is
useful, and do that first. That we shouldn't spend too much time
worrying about how we'd create an index of an object with 3 layers of
nested dicts and lists of lists; we can put hard limits to the
complexity of the expressions we admit, especially at first.

We're going to want to throw away the indexing language in a few years
(WHAT WERE WE THINKING?!? *hair pull*) and rewrite it, and still admit
the old expressions for backwards compatibility, so the smaller it is
(while still being useful) the less we'll have to hack it up later. Yes?
(probably preaching to the choir by now).

Attachment: pgp7uDxK1vTTL.pgp
Description: PGP signature


Follow ups

References