openstack team mailing list archive

Thread
Date
Database stuff

To: openstack@xxxxxxxxxxxxxxxxxxx
From: Soren Hansen <soren@xxxxxxxxxxx>
Date: Tue, 29 Nov 2011 16:20:33 +0100
Hi, guys.

Gosh, this turned out to be long. Sorry about that.

I'm adding some tests for the DB api, and I've stumbled across something
that we should probably discuss.

First of all, most (if not all) of the various *_create methods we have,
return quite amputated objects. Any attempt to access related objects
will fail with the much too familiar:

   DetachedInstanceError: Parent instance <Whatever at 0x4f5c8d0> is not
   bound to a Session; lazy load operation of attribute 'other_things'
   cannot proceed

Also, with the SQLAlchemy driver, this test would pass:

    network = db.network_create(ctxt, {})
    network = db.network_get(ctxt, network['id'])

    instance = db.instance_create(ctxt, {})
    self.assertEquals(len(network['virtual_interfaces']), 0)
    db.virtual_interface_create(ctxt, {'network_id': network['id'],
                                       'instance_id': instance['id']}

    self.assertEquals(len(network['virtual_interfaces']), 0)
    network = db.network_get(ctxt, network['id'])
    self.assertEquals(len(network['virtual_interfaces']), 1)

I create a network, pull it out again (as per my comment above), verify
that it has no virtual_interfaces related to it, create a virtual
interface in this network, and check the network's virtual_interfaces
key and finds that it still has length 0. Reloading the network now
reveals the new virtual interface.

SQLAlchemy does support looking these things up on the fly. In fact,
AFAIK, this is its default behaviour. We just override it with
joinedload options, because we don't use scoped sessions.

My fake db driver looks stuff like this up on the fly (so the
assertEquals after the virtual_interface_create will fail with that db
driver).

So my question is this: Should this be

a) looked up on the fly,
b) looked up on first key access and then cached,
c) looked up when the parent object is loaded and then never again,
d) or up to the driver author?

Or should we do away with this stuff altogether? I.e. no more looking up
related objects by way of __getitem__ lookups, and instead only allow
lookups through db methods. So, instead of
network['virtual_interfaces'], you'd always do
db.virtual_interfaces_get_by_network(ctxt, network['id']).  Let's call
this option e).

I'm pretty undecided myself. If we go with option e) it becomes clear to
consumers of the DB api when they're pulling out fresh stuff from the DB
and when they're reusing potentially old results.  Explicit is better
than implicit, but it'll take quite a bit of work to change this.

If we go with one of options a) through d), my order of preference is
(from most to least preferred): a), d), c), b).

There's value in having a right way and a wrong way to do this. If it's
either-or, it makes testing (as in validation) more awkward. I'd say
it's always possible to do on-the-fly lookups. Overriding __getitem__
and fetching fresh results is pretty simple, and, as mentioned earlier,
I believe this is SQLAlchemy's default behaviour (somebody please
correct me if I'm wrong). Forcing an arbitrary ORM to replicate the
behaviour of b) and c) could be incredibly awkward, and c) is also
complicated because there might be reference loops involved. Also,
reviewing correct use of something where the need for reloads depends on
previous use of your db objects (which might itself be conditional or
happen in called methods) sounds like no fun at all.  With d) it's
pretty straightforward: "Do you want to be sure to have fresh responses?
Then reload the object.  Otherwise, behaviour is undefined." It's
straightforward to explain and review.  Option e) is also easy to
explain and do reviews for, btw.

DB objects with options a) through d) will have trouble making it
through a serializer. After dict serialisation, the object isn't going
to change, unresolved related objects will not be resolved, and
prepopulating everything prior to serialisation is out of the question
due to the potential for loops and very big sets of related objects (and
their related objects and their related objects's related objects,
etc.). I think it would be valuable to not have to think a whole lot
about whether a particular db-like object is a "real" db object or
whether it came in as a JSON object or over the message bus or whatever.
Only option e) will give us that, AFAICS.

It seems I've talked myself into preferring option e). It's too much
work to do on my own, though, and it's going to be disruptive, so we
need to do it real soon. I think it'll be worth it, though.

-- 
Soren Hansen        | http://linux2go.dk/
Ubuntu Developer    | http://www.ubuntu.com/
OpenStack Developer | http://www.openstack.org/
Follow ups

Re: Database stuff
From: Vishvananda Ishaya, 2011-11-29
Re: Database stuff
From: Chris Behrens, 2011-11-29
Re: Database stuff
From: Mark McLoughlin, 2011-11-29
Re: Database stuff
From: Devin Carlen, 2011-11-29
Re: Database stuff
From: Jason Kölker, 2011-11-29
Re: Database stuff
From: Lorin Hochstein, 2011-11-29