← Back to team overview

u1db-discuss team mailing list archive

Indexing and lists

 

So, let's talk about indexing.

Imagine a database with the following documents in:

{ "name": "Stuart", "address": {"town": "Birmingham", "country": "UK"},
  "hair": "red", "colours": [ "red", "blue" ], "id": "sil" }
{ "name": "Samuele", "address": {"town": "Sometown", "country": "CH"},
  "hair": "brown", "colours": [ "green" ], "id": "pedronis" }
{ "name": "Lucio", "address": {"town": "Othertown", "country": "AR"},
  "hair": "brown", "colours": [ "pink" ], "id": "lucio" }
{ "name": "Rodney", "address": {"town": "Podunk", "country": "US"},
  "hair": "brown", "colours": [ "brown", "red" ], "id": "dobey" }

If I create an index as create_index("haircolour", ["hair"]), the index
would look (conceptually) like this:
brown: dobey
brown: lucio
brown: pedronis
red: sil

All agreed so far. We ought to also be able to create an index on
subfields, as was planned from the start, so create_index("townname",
["address.town"]):
Birmingham: sil
Sometown: pedronis
Othertown: lucio
Podunk: dobey

And indexing on multiple fields should also be doable, and was also
planned from the start, so create_index("hairandtown", ["hair",
"address.town"]):
brown,Othertown: pedronis
brown,Podunk: dobey
brown,Sometown: lucio
red,Birmingham: sil

However... I also want to create an index on favourite colours.
Importantly, favourite colours is a list in the documents, so an
individual document should show up more than once in the index. So the
index would want to end up looking like this:

blue: sil
brown: dobey
green: pedronis
pink: lucio
red: dobey
red: sil

So my question is... what's the index expression to create that index?
create_index("colour", ["colours"]) seems wrong -- that feels like it
should put the whole list in the index. What should it be?

Extra thing: what if a document also contains
"phones" [ {"name":"home", "num":"123"},{"name":"work", "num":"456"} ]

how would I do an index on people who have a work phone number?
create_index("worknums", [ "phones.name" ]) ? That feels weird; the
indexer would act differently depending on whether the value of "phones"
is a dict or a list of dicts. Then again, maybe that's the answer; if a
part of an index expression resolves to a list, then we do the remainder
of the index expression for *each item in the list*. This would also
cope with the above colours example, ignoring my reservations about it
feeling weird. To me that makes a certain amount of sense. Thoughts?

sil




Follow ups