openstack team mailing list archive

Thread
Date
Re: Database stuff

To: Aaron Lee <aaron.lee@xxxxxxxxxxxxx>
From: Duncan McGreggor <duncan@xxxxxxxxxxxxx>
Date: Tue, 29 Nov 2011 10:30:06 -0800
Cc: "openstack@xxxxxxxxxxxxxxxxxxx" <openstack@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <73653728-93A6-4860-9EBA-1009C22B0A1F@rackspace.com>
User-agent: Mutt/1.5.21 (2010-09-15)
On 29 Nov 2011 - 17:54, Aaron Lee wrote:
> For this I think we need to separate the models from the queries.

+1

> A query method should be able to populate as many of the models as
> needed to return the data collected. I also think separating the
> queries from the models themselves will help us make the storage
> engine replaceable, and will allow us to extract common behavior into
> the models themselves instead of peppered throughout the codebase.

Couldn't agree more.

d

> Aaron
>
> On Nov 29, 2011, at 10:31 AM, Chris Behrens wrote:
>
> > e) sounds good as long as we don't remove the ability to joinload up
> > front.  Sometimes we need to join.  Sometimes we can be lazy.  With
> > 'list instances with details', we need to get all instances from the
> > DB and join network information in a single DB query.  Doing 1 +
> > (n*x) DB queries to support a 'nova list' will not be acceptable
> > when it can be done in 1 query today.  That means we'll need
> > instance['virtual_interfaces'] (or similar) to work in cases where
> > we 'join up front'.  Most cases when we're pulling instances from
> > the DB, we don't need to join anything.  It'd be more efficient to
> > not join network information for those calls.  This is where e) is
> > fine.
> >
> > - Chris
> >
> > On Nov 29, 2011, at 9:20 AM, Soren Hansen wrote:
> >
> >> Hi, guys.
> >>
> >> Gosh, this turned out to be long. Sorry about that.
> >>
> >> I'm adding some tests for the DB api, and I've stumbled across something
> >> that we should probably discuss.
> >>
> >> First of all, most (if not all) of the various *_create methods we have,
> >> return quite amputated objects. Any attempt to access related objects
> >> will fail with the much too familiar:
> >>
> >>  DetachedInstanceError: Parent instance <Whatever at 0x4f5c8d0> is not
> >>  bound to a Session; lazy load operation of attribute 'other_things'
> >>  cannot proceed
> >>
> >> Also, with the SQLAlchemy driver, this test would pass:
> >>
> >>   network = db.network_create(ctxt, {})
> >>   network = db.network_get(ctxt, network['id'])
> >>
> >>   instance = db.instance_create(ctxt, {})
> >>   self.assertEquals(len(network['virtual_interfaces']), 0)
> >>   db.virtual_interface_create(ctxt, {'network_id': network['id'],
> >>                                      'instance_id': instance['id']}
> >>
> >>   self.assertEquals(len(network['virtual_interfaces']), 0)
> >>   network = db.network_get(ctxt, network['id'])
> >>   self.assertEquals(len(network['virtual_interfaces']), 1)
> >>
> >> I create a network, pull it out again (as per my comment above), verify
> >> that it has no virtual_interfaces related to it, create a virtual
> >> interface in this network, and check the network's virtual_interfaces
> >> key and finds that it still has length 0. Reloading the network now
> >> reveals the new virtual interface.
> >>
> >> SQLAlchemy does support looking these things up on the fly. In fact,
> >> AFAIK, this is its default behaviour. We just override it with
> >> joinedload options, because we don't use scoped sessions.
> >>
> >> My fake db driver looks stuff like this up on the fly (so the
> >> assertEquals after the virtual_interface_create will fail with that db
> >> driver).
> >>
> >> So my question is this: Should this be
> >>
> >> a) looked up on the fly,
> >> b) looked up on first key access and then cached,
> >> c) looked up when the parent object is loaded and then never again,
> >> d) or up to the driver author?
> >>
> >> Or should we do away with this stuff altogether? I.e. no more looking up
> >> related objects by way of __getitem__ lookups, and instead only allow
> >> lookups through db methods. So, instead of
> >> network['virtual_interfaces'], you'd always do
> >> db.virtual_interfaces_get_by_network(ctxt, network['id']).  Let's call
> >> this option e).
> >>
> >> I'm pretty undecided myself. If we go with option e) it becomes clear to
> >> consumers of the DB api when they're pulling out fresh stuff from the DB
> >> and when they're reusing potentially old results.  Explicit is better
> >> than implicit, but it'll take quite a bit of work to change this.
> >>
> >> If we go with one of options a) through d), my order of preference is
> >> (from most to least preferred): a), d), c), b).
> >>
> >> There's value in having a right way and a wrong way to do this. If it's
> >> either-or, it makes testing (as in validation) more awkward. I'd say
> >> it's always possible to do on-the-fly lookups. Overriding __getitem__
> >> and fetching fresh results is pretty simple, and, as mentioned earlier,
> >> I believe this is SQLAlchemy's default behaviour (somebody please
> >> correct me if I'm wrong). Forcing an arbitrary ORM to replicate the
> >> behaviour of b) and c) could be incredibly awkward, and c) is also
> >> complicated because there might be reference loops involved. Also,
> >> reviewing correct use of something where the need for reloads depends on
> >> previous use of your db objects (which might itself be conditional or
> >> happen in called methods) sounds like no fun at all.  With d) it's
> >> pretty straightforward: "Do you want to be sure to have fresh responses?
> >> Then reload the object.  Otherwise, behaviour is undefined." It's
> >> straightforward to explain and review.  Option e) is also easy to
> >> explain and do reviews for, btw.
> >>
> >> DB objects with options a) through d) will have trouble making it
> >> through a serializer. After dict serialisation, the object isn't going
> >> to change, unresolved related objects will not be resolved, and
> >> prepopulating everything prior to serialisation is out of the question
> >> due to the potential for loops and very big sets of related objects (and
> >> their related objects and their related objects's related objects,
> >> etc.). I think it would be valuable to not have to think a whole lot
> >> about whether a particular db-like object is a "real" db object or
> >> whether it came in as a JSON object or over the message bus or whatever.
> >> Only option e) will give us that, AFAICS.
> >>
> >> It seems I've talked myself into preferring option e). It's too much
> >> work to do on my own, though, and it's going to be disruptive, so we
> >> need to do it real soon. I think it'll be worth it, though.
> >>
> >> --
> >> Soren Hansen        | http://linux2go.dk/
> >> Ubuntu Developer    | http://www.ubuntu.com/
> >> OpenStack Developer | http://www.openstack.org/
> >>
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~openstack
> >> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~openstack
> >> More help   : https://help.launchpad.net/ListHelp
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~openstack
> > Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~openstack
> > More help   : https://help.launchpad.net/ListHelp
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
References

Database stuff
From: Soren Hansen, 2011-11-29
Re: Database stuff
From: Chris Behrens, 2011-11-29
Re: Database stuff
From: Aaron Lee, 2011-11-29