Thread Previous • Date Previous • Date Next • Thread Next |
+1 on the thoughts here. Exactly what I meant by my reply. Not sure what the interface should look like for #2, but we must be able to do it somehow. - Chris On Nov 29, 2011, at 11:55 AM, Vishvananda Ishaya wrote: > e) is the right solution imho. The only reason joinedloads slipped in is for efficiency reasons. > > In an ideal world the solution would be: > > 1) (explicitness) Every object or list of related objects is retrieved with an explicit call: > instance = db.instance_get(id) > ifaces = db.interfaces_get_by_instance(id) > for iface in ifaces: > ip = db.fixed_ip_get_by_interface(iface['id']) > 2) (efficiency) Queries are perfectly efficient and all joins that will be used are made at once. > So the above would be a single db query that joins all instances ifaces and ips. > > Unless we're doing source code analysis to generate our queries, then we're probably going > to have to make some tradeoffs to get as much efficiency and explicitness as possible. > > Brainstorming, perhaps we could use a hinting/caching mechanism that the backend could support. > something like db.interfaces_get_by_instance(id, hint='fixed_ip'), which states that you are about to make > another db request to get the fixed ips, so the backend could prejoin and cache the results. Then the next > request could be: db.fixed_ip_get_by_interface(iface['id'], cached=True) or some such. > > I would like to move towards 1) but I think we really have to solve 2) or we will be smashing the database with too many queries. > > Vish > > On Nov 29, 2011, at 7:20 AM, Soren Hansen wrote: > >> Hi, guys. >> >> Gosh, this turned out to be long. Sorry about that. >> >> I'm adding some tests for the DB api, and I've stumbled across something >> that we should probably discuss. >> >> First of all, most (if not all) of the various *_create methods we have, >> return quite amputated objects. Any attempt to access related objects >> will fail with the much too familiar: >> >> DetachedInstanceError: Parent instance <Whatever at 0x4f5c8d0> is not >> bound to a Session; lazy load operation of attribute 'other_things' >> cannot proceed >> >> Also, with the SQLAlchemy driver, this test would pass: >> >> network = db.network_create(ctxt, {}) >> network = db.network_get(ctxt, network['id']) >> >> instance = db.instance_create(ctxt, {}) >> self.assertEquals(len(network['virtual_interfaces']), 0) >> db.virtual_interface_create(ctxt, {'network_id': network['id'], >> 'instance_id': instance['id']} >> >> self.assertEquals(len(network['virtual_interfaces']), 0) >> network = db.network_get(ctxt, network['id']) >> self.assertEquals(len(network['virtual_interfaces']), 1) >> >> I create a network, pull it out again (as per my comment above), verify >> that it has no virtual_interfaces related to it, create a virtual >> interface in this network, and check the network's virtual_interfaces >> key and finds that it still has length 0. Reloading the network now >> reveals the new virtual interface. >> >> SQLAlchemy does support looking these things up on the fly. In fact, >> AFAIK, this is its default behaviour. We just override it with >> joinedload options, because we don't use scoped sessions. >> >> My fake db driver looks stuff like this up on the fly (so the >> assertEquals after the virtual_interface_create will fail with that db >> driver). >> >> So my question is this: Should this be >> >> a) looked up on the fly, >> b) looked up on first key access and then cached, >> c) looked up when the parent object is loaded and then never again, >> d) or up to the driver author? >> >> Or should we do away with this stuff altogether? I.e. no more looking up >> related objects by way of __getitem__ lookups, and instead only allow >> lookups through db methods. So, instead of >> network['virtual_interfaces'], you'd always do >> db.virtual_interfaces_get_by_network(ctxt, network['id']). Let's call >> this option e). >> >> I'm pretty undecided myself. If we go with option e) it becomes clear to >> consumers of the DB api when they're pulling out fresh stuff from the DB >> and when they're reusing potentially old results. Explicit is better >> than implicit, but it'll take quite a bit of work to change this. >> >> If we go with one of options a) through d), my order of preference is >> (from most to least preferred): a), d), c), b). >> >> There's value in having a right way and a wrong way to do this. If it's >> either-or, it makes testing (as in validation) more awkward. I'd say >> it's always possible to do on-the-fly lookups. Overriding __getitem__ >> and fetching fresh results is pretty simple, and, as mentioned earlier, >> I believe this is SQLAlchemy's default behaviour (somebody please >> correct me if I'm wrong). Forcing an arbitrary ORM to replicate the >> behaviour of b) and c) could be incredibly awkward, and c) is also >> complicated because there might be reference loops involved. Also, >> reviewing correct use of something where the need for reloads depends on >> previous use of your db objects (which might itself be conditional or >> happen in called methods) sounds like no fun at all. With d) it's >> pretty straightforward: "Do you want to be sure to have fresh responses? >> Then reload the object. Otherwise, behaviour is undefined." It's >> straightforward to explain and review. Option e) is also easy to >> explain and do reviews for, btw. >> >> DB objects with options a) through d) will have trouble making it >> through a serializer. After dict serialisation, the object isn't going >> to change, unresolved related objects will not be resolved, and >> prepopulating everything prior to serialisation is out of the question >> due to the potential for loops and very big sets of related objects (and >> their related objects and their related objects's related objects, >> etc.). I think it would be valuable to not have to think a whole lot >> about whether a particular db-like object is a "real" db object or >> whether it came in as a JSON object or over the message bus or whatever. >> Only option e) will give us that, AFAICS. >> >> It seems I've talked myself into preferring option e). It's too much >> work to do on my own, though, and it's going to be disruptive, so we >> need to do it real soon. I think it'll be worth it, though. >> >> -- >> Soren Hansen | http://linux2go.dk/ >> Ubuntu Developer | http://www.ubuntu.com/ >> OpenStack Developer | http://www.openstack.org/ >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~openstack >> Post to : openstack@xxxxxxxxxxxxxxxxxxx >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp > > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@xxxxxxxxxxxxxxxxxxx > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp
Thread Previous • Date Previous • Date Next • Thread Next |