launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #05374
Re: Lower query counts via different code structure
Hi Robert,
This sounds like an ambitious project that could positively affect our overall
run-time performance.
You said that our ActiveRecord pattern is at the source of our API (I took it
as REST API) slowness.
I wonder how this new architecture would map externally.
Leonard, maybe you have an idea?
Cheers
--
Francis J. Lacoste
francis.lacoste@xxxxxxxxxxxxx
On October 31, 2010, Robert Collins wrote:
> I've a new structure for our code that I'd like to propose. There's a
> bunch of things that intersect here, and I'm *massively* tired from an
> excellent, but intense UDS + travel... so apologies in advance for any
> incoherence in what follows.
>
> We face many problems in our code with optimising pages. Some specific ones
> are: - impedance mismatch between sql-optimised helpers and code on a
> 'model' object - small changes can cause death-by-queries when a
> previously untuned object is added to a template and triggers per-row
> queries.
>
> I and others have been talking round some ideas about this for a
> while. One idea is a High Level Query Language and introducing an
> explicit mapping layer. Another is pushing much more eager loading
> facilities into and through our ORM - Storm.
>
> Both of those things are good things to do but are either hard,
> incomplete or potentially high risk. That doesn't mean we shouldn't
> work on them!
>
> However, I think a more important and fundamental aspect of the
> problem is that the pattern we use - and which many ORMs use, is
> inclined *by design* to grow complexity and trigger DB queries at just
> the wrong time. That pattern is the "Active Record" pattern, if you
> have the PoEAA book (my copy is in a box...somewhere). Basically it
> says that a single object represents a single row derived from the
> database, and that changes to it are pushed to the database, and
> attribute access on it traverses the object graph from the database.
>
> This pattern affects us in several ways:
> - its the root cause of our death-by-queries behaviour...
> particularly the way that previously tuned pages regress.
> - its the root cause for many performance problems in our APIs
>
> So I have an idea that I think shows some promise. Its at least
> superficially similar to the 'RecordSet' pattern (PoEAA again), but
> I'm going to define it entirely here to avoid confusion, because
> RecordSet may actually be different ( I can't look at the full details
> until I find my PoEAA book), and finally because RecordSet interacts
> with other patterns that we're not really in a place to implement
> without almost-certain significant disruption deep in our stack. Its
> also related to the existing Colllection idioms, but addresses
> multiple types and also our default behaviours.
>
> Lets call this a Group based approach for now (because 'Sets' have an
> over-used history in Launchpad).
>
> Firstly, an example of what code using this might look like (typically
> part of this would happen in url routing). The plurals are deliberate
>
> :) The 'context' object could be obtained via a thread-local or
>
> participation lookup, I've included it here for clarity.
>
> context = Context()
> distros = Launchpad.search(context, distro_name='ubuntu')
> branding = distros.get_branding()
> bugs = distros.hot_bugs(limit=20)
> people = bugs.get_subscribers()
> people.update(bugs.get_assignees())
> people.update(bugs.get_reporters())
> branding.update(people.get_branding())
> context.realise()
> distro = None
> for bug in bugs:
> if bug.distro != distro:
> distro = bug.distro:
> show_branding(distro)
> ...
>
>
> Now, let me annotate this with intent:
> # Setup a context for doing a bunch of related things.
> context = Context()
> # Starting with our root object, get all distros matching the literal
> ubuntu (so the set will be 1)
> # - this doesn't necessarily query at this point. It /may/ but it is
> not required to.
> distros = Launchpad.search(context, distro_name='ubuntu')
> # We're going to want to be able to traverse through to the branding
> for whatever distro(s) we got back, so ask for that.
> # Like before, queries are not necessary at this point.
> branding = distros.get_branding()
> # And so on through some more related things:
> # bugs
> bugs = distros.hot_bugs(limit=20)
> # and their people
> people = bugs.get_subscribers()
> # We want people from multiple sources - this could do an OR or a
> UNION when it actually executes.
> people.update(bugs.get_assignees())
> people.update(bugs.get_reporters())
> # We want the branding for all the people we found.
> branding.update(people.get_branding())
> # We're going to start accessing the individual objects now,
> # so for clarity, trigger all the DB queries needed to satisfy
> # what we've asked for.
> context.realise()
> # And the rest of the code is guaranteed (*) not to do DB queries
> (*) For stuff we've converted. See under 'migration/interoperation'.
>
>
> So the basic idea is to extend the idiom of working on sets of data -
> groups - to the very top of our code stack. Rather than
> Person.is_valid_team, we'd have Persons.are_valid_teams() or
> Persons.filter(valid_team=True).
>
> I've thought up some possible conventions around this, but I'm sure
> there is a lot missing.
> Error reporting:
> - just raise an exception. If we need to dispatch differently, we can
> use an exception that wraps individual exceptions for per-row
> granularity. Note that this will often wait until realise() to be
> raised at all.
>
> Evaluation:
> - lazy as we build out the things we need, then eager on realise().
> We could add a trigger for realize to __iter__ on groups but I think
> it would reduce safety.
> - realise() could potentially perform multiple queries per group - or
> could use DecoratedResultSet (or similar) to perform a single query. A
> primary point of this interface is to let folk working on performance
> change from single query to multiple query *without changing the
> callers*.
>
> Interoperation/Migration:
> - For mutating queries, we need to return existing classes (because
> I'm not proposing a mutating interface [yet]), so our existing Table
> based classes stay, just as they are.
> - we're read heavy: we can start moving methods off of the Table
> classes and onto Groups incrementally. Where a mutating operation
> needs a method, a transient Group can be created containing just
> [self] to permit forwarding the method but not duplicating the code.
> - things which have no mutating operation can be converted to
> non-Table class - to Plain Old Python Objects, which can be unit
> tested to our hearts content.
> - the machinery needed to bootstrap should be minimal, so if we as a
> team like the sound of this, we could sensibly say 'all new code
> should be like this' and start a mass migration.
> - we'd want to convert individual attributes at a time I think, in a
> process like:
> - add a Groups method to specify needing the attribute
> - delegate to the Groups method from the current property/function
> - migrate callers one at a time
> - callers which are using Groups and don't specify needing a
> converted attribute could hit a cross-check which would blow up. This
> is perhaps the wrong approach and ugly though :).
> - once all callers are migrated remove the delegation-to-Groups so
> that we can't ever be bitten on that attribute again. Ever.
>
> Changes to how we do things:
> - methods on 'domain objects' would never perform DB access. They
> must be only using non-lazy attributes - the builder interface would
> populate appropriate attributes (e.g., the columns we use (not all),
> and the related data we obtained (such as self.branding))
> - we'd need a variant of cachedproperty which can answer from cache
> but cannot populate it.
> - we could easily calculate the canonical url of objects as we get
> them from the DB (again, Readonly requests), avoiding non-trivial
> O(N^2) lookup behaviour.
> - We'd probably one one (or more) Group types per db table,
> eventually. These Groups would be very similar to ResultSets except:
> - they are not exposing a SQL interface. They are /typed/.
> - they are specialised for the things being returned
> - where multiple types may be returned, adaption would be a good way
> to talk about one fraction of a group. E.g. search can return
> people/projects etc - we could cast a search result to IPersons to
> talk about things only relevant to Persons.
> - Groups become responsible for injecting attributes we need into the
> individual represents-a-row objects
>
> This can also be looked at as a combination of HQL/query builder - but
> its really much less capable than I'd expect an arbitrary HQL to be.
>
> So - what do you guys think? Does it have legs?
>
> -Rob
>
> _______________________________________________
> Mailing list: https://launchpad.net/~launchpad-dev
> Post to : launchpad-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~launchpad-dev
> More help : https://help.launchpad.net/ListHelp
Attachment:
signature.asc
Description: This is a digitally signed message part.
References