launchpad-dev team mailing list archive

Thread
Date
Re: Lower query counts via different code structure

To: launchpad-dev@xxxxxxxxxxxxxxxxxxx
From: "Francis J. Lacoste" <francis.lacoste@xxxxxxxxxxxxx>
Date: Mon, 1 Nov 2010 09:27:03 -0400
Cc: Robert Collins <robert.collins@xxxxxxxxxxxxx>
In-reply-to: <AANLkTimPs4_0eaO-RNXPbZqFmy3U9YV630HfnaaDsF3S@mail.gmail.com>
Organization: Canonical Ltd.
User-agent: KMail/1.13.5 (Linux/2.6.32-25-generic; KDE/4.5.1; x86_64; ; )
Hi Robert,

This sounds like an ambitious project that could positively affect our overall 
run-time performance.

You said that our ActiveRecord pattern is at the source of our API (I took it 
as REST API) slowness.

I wonder how this new architecture would map externally.

Leonard, maybe you have an idea?

Cheers

-- 
Francis J. Lacoste
francis.lacoste@xxxxxxxxxxxxx

On October 31, 2010, Robert Collins wrote:
> I've a new structure for our code that I'd like to propose. There's a
> bunch of things that intersect here, and I'm *massively* tired from an
> excellent, but intense UDS + travel... so apologies in advance for any
> incoherence in what follows.
> 
> We face many problems in our code with optimising pages. Some specific ones
> are: - impedance mismatch between sql-optimised helpers and code on a
> 'model' object - small changes can cause death-by-queries when a
> previously untuned object is added to a template and triggers per-row
> queries.
> 
> I and others have been talking round some ideas about this for a
> while. One idea is a High Level Query Language and introducing an
> explicit mapping layer. Another is pushing much more eager loading
> facilities into and through our ORM - Storm.
> 
> Both of those things are good things to do but are either hard,
> incomplete or potentially high risk. That doesn't mean we shouldn't
> work on them!
> 
> However, I think a more important and fundamental aspect of the
> problem is that the pattern we use - and which many ORMs use, is
> inclined *by design* to grow complexity and trigger DB queries at just
> the wrong time. That pattern is the "Active Record" pattern, if you
> have the PoEAA book (my copy is in a box...somewhere). Basically it
> says that a single object represents a single row derived from the
> database, and that changes to it are pushed to the database, and
> attribute access on it traverses the object graph from the database.
> 
> This pattern affects us in several ways:
>  - its the root cause of our death-by-queries behaviour...
> particularly the way that previously tuned pages regress.
>  - its the root cause for many performance problems in our APIs
> 
> So I have an idea that I think shows some promise. Its at least
> superficially similar to the 'RecordSet' pattern (PoEAA again), but
> I'm going to define it entirely here to avoid confusion, because
> RecordSet may actually be different ( I can't look at the full details
> until I find my PoEAA book), and finally because RecordSet interacts
> with other patterns that we're not really in a place to implement
> without almost-certain significant disruption deep in our stack. Its
> also related to the existing Colllection idioms, but addresses
> multiple types and also our default behaviours.
> 
> Lets call this a Group based approach for now (because 'Sets' have an
> over-used history in Launchpad).
> 
> Firstly, an example of what code using this might look like (typically
> part of this would happen in url routing). The plurals are deliberate
> 
> :) The 'context' object could be obtained via a thread-local or
> 
> participation lookup, I've included it here for clarity.
> 
> context = Context()
> distros = Launchpad.search(context, distro_name='ubuntu')
> branding = distros.get_branding()
> bugs = distros.hot_bugs(limit=20)
> people = bugs.get_subscribers()
> people.update(bugs.get_assignees())
> people.update(bugs.get_reporters())
> branding.update(people.get_branding())
> context.realise()
> distro = None
> for bug in bugs:
>     if bug.distro != distro:
>         distro = bug.distro:
>             show_branding(distro)
>     ...
> 
> 
> Now, let me annotate this with intent:
> # Setup a context for doing a bunch of related things.
> context = Context()
> # Starting with our root object, get all distros matching the literal
> ubuntu (so the set will be 1)
> # - this doesn't necessarily query at this point. It /may/ but it is
> not required to.
> distros = Launchpad.search(context, distro_name='ubuntu')
> # We're going to want to be able to traverse through to the branding
> for whatever distro(s) we got back, so ask for that.
> # Like before, queries are not necessary at this point.
> branding = distros.get_branding()
> # And so on through some more related things:
> # bugs
> bugs = distros.hot_bugs(limit=20)
> # and their people
> people = bugs.get_subscribers()
> # We want people from multiple sources - this could do an OR or a
> UNION when it actually executes.
> people.update(bugs.get_assignees())
> people.update(bugs.get_reporters())
> # We want the branding for all the people we found.
> branding.update(people.get_branding())
> # We're going to start accessing the individual objects now,
> # so for clarity, trigger all the DB queries needed to satisfy
> # what we've asked for.
> context.realise()
> # And the rest of the code is guaranteed (*) not to do DB queries
> (*) For stuff we've converted.  See under 'migration/interoperation'.
> 
> 
> So the basic idea is to extend the idiom of working on sets of data -
> groups - to the very top of our code stack. Rather than
> Person.is_valid_team, we'd have Persons.are_valid_teams() or
> Persons.filter(valid_team=True).
> 
> I've thought up some possible conventions around this, but I'm sure
> there is a lot missing.
> Error reporting:
>  - just raise an exception. If we need to dispatch differently, we can
> use an exception that wraps individual exceptions for per-row
> granularity. Note that this will often wait until realise() to be
> raised at all.
> 
> Evaluation:
>  - lazy as we build out the things we need, then eager on realise().
> We could add a trigger for realize to __iter__ on groups but I think
> it would reduce safety.
>  - realise() could potentially perform multiple queries per group - or
> could use DecoratedResultSet (or similar) to perform a single query. A
> primary point of this interface is to let folk working on performance
> change from single query to multiple query *without changing the
> callers*.
> 
> Interoperation/Migration:
>  - For mutating queries, we need to return existing classes (because
> I'm not proposing a mutating interface [yet]), so our existing Table
> based classes stay, just as they are.
>  - we're read heavy: we can start moving methods off of the Table
> classes and onto Groups incrementally. Where a mutating operation
> needs a method, a transient Group can be created containing just
> [self] to permit forwarding the method but not duplicating the code.
>  - things which have no mutating operation can be converted to
> non-Table class - to Plain Old Python Objects, which can be unit
> tested to our hearts content.
>  - the machinery needed to bootstrap should be minimal, so if we as a
> team like the sound of this, we could sensibly say 'all new code
> should be like this' and start a mass migration.
>  - we'd want to convert individual attributes at a time I think, in a
> process like:
>   - add a Groups method to specify needing the attribute
>   - delegate to the Groups method from the current property/function
>   - migrate callers one at a time
>   - callers which are using Groups and don't specify needing a
> converted attribute could hit a cross-check which would blow up. This
> is perhaps the wrong approach and ugly though :).
>   - once all callers are migrated remove the delegation-to-Groups so
> that we can't ever be bitten on that attribute again. Ever.
> 
> Changes to how we do things:
>  - methods on 'domain objects' would never perform DB access. They
> must be only using non-lazy attributes - the builder interface would
> populate appropriate attributes (e.g., the columns we use (not all),
> and the related data we obtained (such as self.branding))
>  - we'd need a variant of cachedproperty which can answer from cache
> but cannot populate it.
>  - we could easily calculate the canonical url of objects as we get
> them from the DB (again, Readonly requests), avoiding non-trivial
> O(N^2) lookup behaviour.
>  - We'd probably one one (or more) Group types per db table,
> eventually. These Groups would be very similar to ResultSets except:
>   - they are not exposing a SQL interface. They are /typed/.
>   - they are specialised for the things being returned
>   - where multiple types may be returned, adaption would be a good way
> to talk about one fraction of a group. E.g. search can return
> people/projects etc - we could cast a search result to IPersons to
> talk about things only relevant to Persons.
>  - Groups become responsible for injecting attributes we need into the
> individual represents-a-row objects
> 
> This can also be looked at as a combination of HQL/query builder - but
> its really much less capable than I'd expect an arbitrary HQL to be.
> 
> So - what do you guys think? Does it have legs?
> 
> -Rob
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~launchpad-dev
> Post to     : launchpad-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~launchpad-dev
> More help   : https://help.launchpad.net/ListHelp
Attachment: signature.asc
Description: This is a digitally signed message part.
References

Lower query counts via different code structure
From: Robert Collins, 2010-11-01