openstack team mailing list archive

Thread
Date
Re: Database stuff

To: Vishvananda Ishaya <vishvananda@xxxxxxxxx>
From: Jay Pipes <jaypipes@xxxxxxxxx>
Date: Tue, 29 Nov 2011 13:20:09 -0500
Cc: openstack@xxxxxxxxxxxxxxxxxxx
In-reply-to: <D35C83C0-3D10-4324-9F16-7FCD6962853E@gmail.com>
+10

On Tue, Nov 29, 2011 at 12:55 PM, Vishvananda Ishaya
<vishvananda@xxxxxxxxx> wrote:
> e) is the right solution imho.  The only reason joinedloads slipped in is for efficiency reasons.
>
> In an ideal world the solution would be:
>
> 1) (explicitness) Every object or list of related objects is retrieved with an explicit call:
>  instance = db.instance_get(id)
>  ifaces = db.interfaces_get_by_instance(id)
>  for iface in ifaces:
>     ip = db.fixed_ip_get_by_interface(iface['id'])
> 2) (efficiency) Queries are perfectly efficient and all joins that will be used are made at once.
>  So the above would be a single db query that joins all instances ifaces and ips.
>
> Unless we're doing source code analysis to generate our queries, then we're probably going
> to have to make some tradeoffs to get as much efficiency and explicitness as possible.
>
> Brainstorming, perhaps we could use a hinting/caching mechanism that the backend could support.
> something like db.interfaces_get_by_instance(id, hint='fixed_ip'), which states that you are about to make
> another db request to get the fixed ips, so the backend could prejoin and cache the results. Then the next
> request could be: db.fixed_ip_get_by_interface(iface['id'], cached=True) or some such.
>
> I would like to move towards 1) but I think we really have to solve 2) or we will be smashing the database with too many queries.
>
> Vish
>
> On Nov 29, 2011, at 7:20 AM, Soren Hansen wrote:
>
>> Hi, guys.
>>
>> Gosh, this turned out to be long. Sorry about that.
>>
>> I'm adding some tests for the DB api, and I've stumbled across something
>> that we should probably discuss.
>>
>> First of all, most (if not all) of the various *_create methods we have,
>> return quite amputated objects. Any attempt to access related objects
>> will fail with the much too familiar:
>>
>>   DetachedInstanceError: Parent instance <Whatever at 0x4f5c8d0> is not
>>   bound to a Session; lazy load operation of attribute 'other_things'
>>   cannot proceed
>>
>> Also, with the SQLAlchemy driver, this test would pass:
>>
>>    network = db.network_create(ctxt, {})
>>    network = db.network_get(ctxt, network['id'])
>>
>>    instance = db.instance_create(ctxt, {})
>>    self.assertEquals(len(network['virtual_interfaces']), 0)
>>    db.virtual_interface_create(ctxt, {'network_id': network['id'],
>>                                       'instance_id': instance['id']}
>>
>>    self.assertEquals(len(network['virtual_interfaces']), 0)
>>    network = db.network_get(ctxt, network['id'])
>>    self.assertEquals(len(network['virtual_interfaces']), 1)
>>
>> I create a network, pull it out again (as per my comment above), verify
>> that it has no virtual_interfaces related to it, create a virtual
>> interface in this network, and check the network's virtual_interfaces
>> key and finds that it still has length 0. Reloading the network now
>> reveals the new virtual interface.
>>
>> SQLAlchemy does support looking these things up on the fly. In fact,
>> AFAIK, this is its default behaviour. We just override it with
>> joinedload options, because we don't use scoped sessions.
>>
>> My fake db driver looks stuff like this up on the fly (so the
>> assertEquals after the virtual_interface_create will fail with that db
>> driver).
>>
>> So my question is this: Should this be
>>
>> a) looked up on the fly,
>> b) looked up on first key access and then cached,
>> c) looked up when the parent object is loaded and then never again,
>> d) or up to the driver author?
>>
>> Or should we do away with this stuff altogether? I.e. no more looking up
>> related objects by way of __getitem__ lookups, and instead only allow
>> lookups through db methods. So, instead of
>> network['virtual_interfaces'], you'd always do
>> db.virtual_interfaces_get_by_network(ctxt, network['id']).  Let's call
>> this option e).
>>
>> I'm pretty undecided myself. If we go with option e) it becomes clear to
>> consumers of the DB api when they're pulling out fresh stuff from the DB
>> and when they're reusing potentially old results.  Explicit is better
>> than implicit, but it'll take quite a bit of work to change this.
>>
>> If we go with one of options a) through d), my order of preference is
>> (from most to least preferred): a), d), c), b).
>>
>> There's value in having a right way and a wrong way to do this. If it's
>> either-or, it makes testing (as in validation) more awkward. I'd say
>> it's always possible to do on-the-fly lookups. Overriding __getitem__
>> and fetching fresh results is pretty simple, and, as mentioned earlier,
>> I believe this is SQLAlchemy's default behaviour (somebody please
>> correct me if I'm wrong). Forcing an arbitrary ORM to replicate the
>> behaviour of b) and c) could be incredibly awkward, and c) is also
>> complicated because there might be reference loops involved. Also,
>> reviewing correct use of something where the need for reloads depends on
>> previous use of your db objects (which might itself be conditional or
>> happen in called methods) sounds like no fun at all.  With d) it's
>> pretty straightforward: "Do you want to be sure to have fresh responses?
>> Then reload the object.  Otherwise, behaviour is undefined." It's
>> straightforward to explain and review.  Option e) is also easy to
>> explain and do reviews for, btw.
>>
>> DB objects with options a) through d) will have trouble making it
>> through a serializer. After dict serialisation, the object isn't going
>> to change, unresolved related objects will not be resolved, and
>> prepopulating everything prior to serialisation is out of the question
>> due to the potential for loops and very big sets of related objects (and
>> their related objects and their related objects's related objects,
>> etc.). I think it would be valuable to not have to think a whole lot
>> about whether a particular db-like object is a "real" db object or
>> whether it came in as a JSON object or over the message bus or whatever.
>> Only option e) will give us that, AFAICS.
>>
>> It seems I've talked myself into preferring option e). It's too much
>> work to do on my own, though, and it's going to be disruptive, so we
>> need to do it real soon. I think it'll be worth it, though.
>>
>> --
>> Soren Hansen        | http://linux2go.dk/
>> Ubuntu Developer    | http://www.ubuntu.com/
>> OpenStack Developer | http://www.openstack.org/
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
References

Database stuff
From: Soren Hansen, 2011-11-29
Re: Database stuff
From: Vishvananda Ishaya, 2011-11-29