u1db-discuss team mailing list archive
-
u1db-discuss team
-
Mailing list archive
-
Message #00033
Re: Indexing and lists
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Just so we're all aware, we do have an initial Index implementation
that handles 'lists' from James Westby:
https://code.launchpad.net/~james-w/u1db/index-transformations/+merge/81069
On 11/17/2011 11:01 PM, John Rowland Lenton wrote:
> On Thu, 17 Nov 2011 20:06:29 +0000, Stuart Langridge
> <stuart.langridge@xxxxxxxxxxxxx> wrote:
>>
>> how would I do an index on people who have a work phone number?
>> create_index("worknums", [ "phones.name" ]) ? That feels weird;
>> the indexer would act differently depending on whether the value
>> of "phones" is a dict or a list of dicts. Then again, maybe
>> that's the answer; if a part of an index expression resolves to a
>> list, then we do the remainder of the index expression for *each
>> item in the list*. This would also cope with the above colours
>> example, ignoring my reservations about it feeling weird. To me
>> that makes a certain amount of sense. Thoughts?
>
> More questions: do you want to be able to create an index on the
> names of an object? Do we want partial indexes? If we have an index
> expression that transforms a string into a list of strings, do we
> need to explicitly say that we want each of those added separately
> to the index, rather than the list itself?
1) I think you mean by this something like:
create_index('names', ['names()'])
create_doc('{"a": 1, "b": 2}')
get_from_index('names', ["a"])
Would then return the document.
I can see where that could be useful, though if there are only a
small number of names that you care about, then you can create an
index for each one.
2) I'm not 100% sure what you mean by partial indexes here. If part of
an index evaluates to 'null', then that document is not put into the
index.
Maybe you are taking it a step further and having an equality check?
create_index('john', ['equal(name, "john")'])
or
create_index('john', ['name == "john"])
The former fits into our current syntax ok, the latter would be a
possible transformation, but I imagine the syntax parser gets crazy
when you start layering them.
3) I think here you mean do we want something like:
create_index('favcolor', ["any(colour)"])
rather than just writing it as:
create_index('favcolor', ["colour"])
And if the 'colour' field is a list, we just evaluate each item of
the list.
I think I agree that 'any()' seems superfluous. The question that
remains is if we want an 'all()' function (flatten a list into a
single item).
As an example:
create_index('all_colour', ['all(colours)'])
get_from_index('all_colour', ['green'])
returns Samuel
get_from_index('all_colour', ['red'])
returns [], nobody likes *just* red.
get_from_index('all_colour', ['red|blue'])
returns Stuart
I don't think we want all() because its syntax is probably a set
operation (red,blue) is the same as (blue,red)?
And I think users can approximate it in user-space with:
create_index('colour', ['colours'])
docs = get_from_index('colour', ['red', 'blue'])
for doc in docs:
if 'red' not in doc.colours or 'blue' not in doc.colours:
# doesn't like both
continue
...
>
> I think the answer to those is no, yes, and no: I think the rule
> for index expressions should be that they either resolve to a
> single "scalar" value (one of string|number|true|false|null), which
> is added to the index, or to a list, which scalar elements are
> added sequentially to the index, and that if neither of those
> happens it's not an error, it simply isn't added (I'm on the fence
> as to whether lists that have list elements should have the
> elements of the list elemenet added recursively; having to explain
> that makes my head hurt a little. man perllol). That we should
> provide no index functions to address individual items of a list;
> if you need to treat the second item differently from the first,
> then it should be an object, not a list. that
> "name.split().lower()" (or "lower(split(name))", or
> name|split|lower, or whatever) should result in the same values
> added to the index as "name.lower().split()". And that we should
> continue to enforce the semantics (in the same way i said "you
> shouldn't care about the nth element of the list") by saying that
> you shouldn't get into the situation where you have to create an
> index on the keys.
>
> I also think that after describing what we want for the indexing
> language, we need to look at what is the minimal thing we can do
> that is useful, and do that first. That we shouldn't spend too much
> time worrying about how we'd create an index of an object with 3
> layers of nested dicts and lists of lists; we can put hard limits
> to the complexity of the expressions we admit, especially at
> first.
>
> We're going to want to throw away the indexing language in a few
> years (WHAT WERE WE THINKING?!? *hair pull*) and rewrite it, and
> still admit the old expressions for backwards compatibility, so the
> smaller it is (while still being useful) the less we'll have to
> hack it up later. Yes? (probably preaching to the choir by now).
I think you have some good points here. Something simple that is
functional enough to get work done, and then iterate to find a better
solution.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk7GF6oACgkQJdeBCYSNAAMRuwCfdtS2ihPUr0aeYqZWUZZAG9Do
jIsAnjxpAlUei1lyMuglI3CgiMFrC5o7
=Q54u
-----END PGP SIGNATURE-----
References