← Back to team overview

openstack team mailing list archive

Re: Database stuff

 

+1 on the thoughts here.  Exactly what I meant by my reply.  Not sure what the interface should look like for #2, but we must be able to do it somehow.

- Chris

On Nov 29, 2011, at 11:55 AM, Vishvananda Ishaya wrote:

> e) is the right solution imho.  The only reason joinedloads slipped in is for efficiency reasons.
> 
> In an ideal world the solution would be:
> 
> 1) (explicitness) Every object or list of related objects is retrieved with an explicit call:
> instance = db.instance_get(id)
> ifaces = db.interfaces_get_by_instance(id)
> for iface in ifaces:
>    ip = db.fixed_ip_get_by_interface(iface['id'])
> 2) (efficiency) Queries are perfectly efficient and all joins that will be used are made at once.
> So the above would be a single db query that joins all instances ifaces and ips.
> 
> Unless we're doing source code analysis to generate our queries, then we're probably going
> to have to make some tradeoffs to get as much efficiency and explicitness as possible.
> 
> Brainstorming, perhaps we could use a hinting/caching mechanism that the backend could support.
> something like db.interfaces_get_by_instance(id, hint='fixed_ip'), which states that you are about to make
> another db request to get the fixed ips, so the backend could prejoin and cache the results. Then the next
> request could be: db.fixed_ip_get_by_interface(iface['id'], cached=True) or some such.
> 
> I would like to move towards 1) but I think we really have to solve 2) or we will be smashing the database with too many queries.
> 
> Vish
> 
> On Nov 29, 2011, at 7:20 AM, Soren Hansen wrote:
> 
>> Hi, guys.
>> 
>> Gosh, this turned out to be long. Sorry about that.
>> 
>> I'm adding some tests for the DB api, and I've stumbled across something
>> that we should probably discuss.
>> 
>> First of all, most (if not all) of the various *_create methods we have,
>> return quite amputated objects. Any attempt to access related objects
>> will fail with the much too familiar:
>> 
>> DetachedInstanceError: Parent instance <Whatever at 0x4f5c8d0> is not
>> bound to a Session; lazy load operation of attribute 'other_things'
>> cannot proceed
>> 
>> Also, with the SQLAlchemy driver, this test would pass:
>> 
>>  network = db.network_create(ctxt, {})
>>  network = db.network_get(ctxt, network['id'])
>> 
>>  instance = db.instance_create(ctxt, {})
>>  self.assertEquals(len(network['virtual_interfaces']), 0)
>>  db.virtual_interface_create(ctxt, {'network_id': network['id'],
>>                                     'instance_id': instance['id']}
>> 
>>  self.assertEquals(len(network['virtual_interfaces']), 0)
>>  network = db.network_get(ctxt, network['id'])
>>  self.assertEquals(len(network['virtual_interfaces']), 1)
>> 
>> I create a network, pull it out again (as per my comment above), verify
>> that it has no virtual_interfaces related to it, create a virtual
>> interface in this network, and check the network's virtual_interfaces
>> key and finds that it still has length 0. Reloading the network now
>> reveals the new virtual interface.
>> 
>> SQLAlchemy does support looking these things up on the fly. In fact,
>> AFAIK, this is its default behaviour. We just override it with
>> joinedload options, because we don't use scoped sessions.
>> 
>> My fake db driver looks stuff like this up on the fly (so the
>> assertEquals after the virtual_interface_create will fail with that db
>> driver).
>> 
>> So my question is this: Should this be
>> 
>> a) looked up on the fly,
>> b) looked up on first key access and then cached,
>> c) looked up when the parent object is loaded and then never again,
>> d) or up to the driver author?
>> 
>> Or should we do away with this stuff altogether? I.e. no more looking up
>> related objects by way of __getitem__ lookups, and instead only allow
>> lookups through db methods. So, instead of
>> network['virtual_interfaces'], you'd always do
>> db.virtual_interfaces_get_by_network(ctxt, network['id']).  Let's call
>> this option e).
>> 
>> I'm pretty undecided myself. If we go with option e) it becomes clear to
>> consumers of the DB api when they're pulling out fresh stuff from the DB
>> and when they're reusing potentially old results.  Explicit is better
>> than implicit, but it'll take quite a bit of work to change this.
>> 
>> If we go with one of options a) through d), my order of preference is
>> (from most to least preferred): a), d), c), b).
>> 
>> There's value in having a right way and a wrong way to do this. If it's
>> either-or, it makes testing (as in validation) more awkward. I'd say
>> it's always possible to do on-the-fly lookups. Overriding __getitem__
>> and fetching fresh results is pretty simple, and, as mentioned earlier,
>> I believe this is SQLAlchemy's default behaviour (somebody please
>> correct me if I'm wrong). Forcing an arbitrary ORM to replicate the
>> behaviour of b) and c) could be incredibly awkward, and c) is also
>> complicated because there might be reference loops involved. Also,
>> reviewing correct use of something where the need for reloads depends on
>> previous use of your db objects (which might itself be conditional or
>> happen in called methods) sounds like no fun at all.  With d) it's
>> pretty straightforward: "Do you want to be sure to have fresh responses?
>> Then reload the object.  Otherwise, behaviour is undefined." It's
>> straightforward to explain and review.  Option e) is also easy to
>> explain and do reviews for, btw.
>> 
>> DB objects with options a) through d) will have trouble making it
>> through a serializer. After dict serialisation, the object isn't going
>> to change, unresolved related objects will not be resolved, and
>> prepopulating everything prior to serialisation is out of the question
>> due to the potential for loops and very big sets of related objects (and
>> their related objects and their related objects's related objects,
>> etc.). I think it would be valuable to not have to think a whole lot
>> about whether a particular db-like object is a "real" db object or
>> whether it came in as a JSON object or over the message bus or whatever.
>> Only option e) will give us that, AFAICS.
>> 
>> It seems I've talked myself into preferring option e). It's too much
>> work to do on my own, though, and it's going to be disruptive, so we
>> need to do it real soon. I think it'll be worth it, though.
>> 
>> -- 
>> Soren Hansen        | http://linux2go.dk/
>> Ubuntu Developer    | http://www.ubuntu.com/
>> OpenStack Developer | http://www.openstack.org/
>> 
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
> 
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp



References