← Back to team overview

drizzle-discuss team mailing list archive

Re: Stewart Smith: AlsoSQL

 

Hi Stewart!

On Wed, Apr 7, 2010 at 11:00 PM, Planet Drizzle
<emailer@xxxxxxxxxxxxxxxxx> wrote:
> A flexible query language is rather useful though. I think we’ll see the
> rise of AlsoSQL. That is systems that present a fast and simple protocol
> along with a SQL interface.

Aye to query languages :)

> This hybrid system has seen use for many years. MySQL Cluster is one such
> example. SQL through MySQL Server, NoSQL through NDB API.

Does NDB support a query language? Perhaps I am mistaken but I though
it was a kind of API, where you have to pretty much tell the machine
exactly what it should do.

> With Drizzle, I feel we’ll be in a pretty good position to offer non-sql
> based protocols and access methods to existing storage engines.

I have been looking at Freebase lately (http://www.freebase.com/docs/data)
I think it is a pretty interesting project for a number of reasons.
One of them is that it has what seems to me a pretty fully featured
query language.
It's called MQL, and it allows query by example, subqueries (but the
feature could also be seen as a join, depending on how you look at
it), aggregation (although limited), sorting, pattern matching and
more.

The query language is available through a RESTful webservice (not sure
if it's really REST, they claim so) and is a JSON vocabulary.
(see: http://www.freebase.com/docs/web_services/mqlread)

It is a pretty natural fit for (Ajax) web applications. Here's a
simple query to retreive all films directed by Francis Ford Coppola,
along with their genre:

[{
  "type": "/film/film",
  "directed_by": "Francis Ford Coppola",
  "name": null,
}]

Basically, this query asks for all films directed by Francis Ford Coppola

This is the result:

[
  {
    "directed_by": "Francis Ford Coppola",
    "name": "Apocalypse Now",
    "type": "/film/film"
  },
  {
    "directed_by": "Francis Ford Coppola"
    "name": "The Conversation"
    "type": "/film/film"
  },
  ....many more objects...
]

(If you want to try it out for yourself, use the query editor:
http://www.freebase.com/app/queryeditor)


As you can see, whatever we specify (type, directed_by) is matched and
returned, whatever we leave null is filled in.

Filling in the blanks also works for collections (genre in the
following snippet):

[{
  "type": "/film/film",
  "directed_by": "Francis Ford Coppola",
  "name": null,
  "genre": []
}]

Result:

[
  {
    "directed_by": "Francis Ford Coppola",
    "name": "Apocalypse Now",
    "type": "/film/film",
    "genre": ["Cult","War","Existentialism","New Hollywood"]
  },
  ....many more objects...
]

This is also how you do joins. Suppose we want to get the names and
characters of the actors:

[{
  "type": "/film/film",
  "directed_by": "Francis Ford Coppola",
  "name": null,
  "genre": [],
  "starring": [{"actor": null, "character": null}],
}]

Result is:

[
  {
    "directed_by": "Francis Ford Coppola",
    "name": "Apocalypse Now",
    "type": "/film/film",
    "genre": ["Cult","War","Existentialism","New Hollywood"],
    "starring": [
        {
            "actor": "Marlon Brando",
            "character": "Colonel Walter E. Kurtz"
        },
        {
            "actor": "Robert Duvall",
            "character": "Lt. Colonel William Kilgore"
       },
       {
           "actor": "Martin Sheen",
           "character": "Captain Willard"
       }
        ....many more actors...
     ]
   },
   ...many more films...
]

Actually, the difference with the earlier "genre" property is
superficial: "genre" is also a type in its own right, and we could
have gotten objects back for that too. But because we didn't ask for
the genre to be returned as objects (we didn't use the curly braces)
MQL automagically does the right thing and returns it as a list of
scalars.

Another, really, really nice thing IMO is that MQL can do something
SQL can't, or only with difficulty. Genre and Actor are both details
of film, and MQL has no difficulty returning them in the result with a
single query. Try that in SQL - you can't: in for example the sakila
database, a join like:

FROM film
INNER JOIN film_actor USING (film_id)
INNER JOIN film_category USING (category_id)

would explode into a semi-cartesian product in case you have multiple
actors and multiple categories for one film. You can only work around
it by doing two queries, or by doing some nasty multi-level grouping.
(first join film to actors, group by film and serialize actors with
group_concat - then join to categories, group by and group concat to
serialize the categories.)

As you can see, this query language totally obliterates the need for a
object relational mapper - the applications can simply decide it
themselves by writing the query according to the structure they want
at the app level. In my book, that is a big win.

There are a few tricks if the queries get more complex. For example,
it seems natural to query all movies starring both Robert de Niro and
Al Pacino with this query:

[{
  "type": "/film/director",
  "name": null,
  "film": [{
    "name": null,
    "starring": [{"actor": "Robert de Niro"}, {"name": "Al Pacino"}]
  }]
}]

but this doesn't work. There is a reason for that, but this mail is
already too long so you better look that up yourself. This is the
right way to do it though:

[{
  "type": "/film/director",
  "name": null,
  "film": [{
    "name": null,
    "a:starring": {
        "actor": "Robert de Niro",
        "character": null
    },
    "b:starring": {
        "actor": "Al Pacino",
        "character": null
    }
  }]
}]

IMO, still a pretty reasonable syntax for a pretty complex query.

Finally, I'd like to point out that although Freebase (or actually the
underlying Metaweb database) is not relation, it is a graph database.
But that hasn't prevented them from still maintaining pretty well
defined schemas: the "type" attribute is basically a table as it
defines a bag of properties (and relationships). But because one
entitity can have many different types, and you can access properties
of all associated types from within a single object notation, it still
feels just like a so-called "schemaless" database in that respect.

Anyway, I figured I just throw this in - I'd love to see this MQL
language (and the webservice) slapped unto a relational database. I
have some ideas for the mapping but I am sure other people see how
this would work out.
I am currently trying to build a quick and dirty PHP script to do this
as a proof of concept.
If anybody is interested in doing this natively in drizzle, I'd love
to help out. (I fear I don't have the skills to jump in and do it on
my own)

Oh, final thing: there will be a freebase meetup in San Franscisco, April 14.
http://www.meetup.com/sf-freebase/calendar/12845548/
I am currently trying to figure out if it will be worth travelling
from the MySQL uc to SF for that evening, but if other people are
interested in going, perhaps we can join and go together?

Thanks and kind regards,

Roland.

>
> URL: http://www.flamingspork.com/blog/2010/04/08/alsosql/
>
> _______________________________________________
> Mailing list: https://launchpad.net/~drizzle-discuss
> Post to     : drizzle-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~drizzle-discuss
> More help   : https://help.launchpad.net/ListHelp
>
>



-- 
Roland Bouman
http://rpbouman.blogspot.com/

Author of "Pentaho Solutions: Business Intelligence and Data
Warehousing with Pentaho and MySQL",
http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470484322.html



Follow ups

References