launchpad-dev team mailing list archive

Thread
Date

Re: Lower query counts via different code structure

To: launchpad-dev@xxxxxxxxxxxxxxxxxxx
From: Aaron Bentley <aaron@xxxxxxxxxxxxx>
Date: Mon, 01 Nov 2010 11:07:49 -0400
In-reply-to: <AANLkTimPs4_0eaO-RNXPbZqFmy3U9YV630HfnaaDsF3S@mail.gmail.com>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/31/2010 08:47 PM, Robert Collins wrote:
> I've a new structure for our code that I'd like to propose.

I'm very pleased to see this proposal.  It fits my own thinking quite well.

> So the basic idea is to extend the idiom of working on sets of data -
> groups - to the very top of our code stack. Rather than
> Person.is_valid_team, we'd have Persons.are_valid_teams() or
> Persons.filter(valid_team=True).

I hope that would include extending security checks to working on sets
of data.

>  - we'd want to convert individual attributes at a time I think, in a
> process like:
>   - add a Groups method to specify needing the attribute
>   - delegate to the Groups method from the current property/function
>   - migrate callers one at a time
>   - callers which are using Groups and don't specify needing a
> converted attribute could hit a cross-check which would blow up. This
> is perhaps the wrong approach and ugly though :).
>   - once all callers are migrated remove the delegation-to-Groups so
> that we can't ever be bitten on that attribute again. Ever.

I don't really follow this.  By attribute, you're referring to
attributes that are mapped to table columns?  Are only retrieving
selected columns to make queries less expensive?  Aren't such queries
already cheap?

I can see a case for not traversing foreign key references except when
explicitly requested.

> Changes to how we do things:
>  - methods on 'domain objects' would never perform DB access.

These are the 'Plain Old Python' objects?  How would we deal with
situations where we want to read and then write?

>  - We'd probably one one (or more) Group types per db table,
> eventually.

Can you give an example where we'd want multiple Groups per db table?

> These Groups would be very similar to ResultSets except:
>   - they are not exposing a SQL interface. They are /typed/.
>   - they are specialised for the things being returned
>   - where multiple types may be returned, adaption would be a good way
> to talk about one fraction of a group. E.g. search can return
> people/projects etc - we could cast a search result to IPersons to
> talk about things only relevant to Persons.
>  - Groups become responsible for injecting attributes we need into the
> individual represents-a-row objects
> 
> This can also be looked at as a combination of HQL/query builder - but
> its really much less capable than I'd expect an arbitrary HQL to be.
> 
> So - what do you guys think? Does it have legs?

Definitely a direction I think we should pursue.  The impedance mismatch
between efficient SQL and the potato programming that ORMs traditionally
encourage is hurting us.

I've been listening to Ian Booth talking about the importance of
separating business logic from the ORM objects, and it doesn't entirely
make sense to me.  Our business logic is represented by our ORM objects,
but the fact that they are database-backed is hardly ever important.

Following his advice seems to mean stripping the ORM objects down until
they are just bags of data, and then having a parallel hierarchy of
domain objects that would apply our business logic on such bags of data.
 Presumably, that would include specifying how to look up attributes on
the ORM objects, and so the domain objects would wind up looking pretty
much the same as our current ORM objects.

I don't disagree with the argument that this would permit faster
testing, but instead, I believe we could provide an in-memory Store
implementation that would provide the same advantage without
restructuring our code.  I have no appetite for maintaining yet another
hierarchy of classes, especially if the ORM objects degenerate into bags
of data.

Perhaps we don't need that.  Maybe we can just work smarter.  If the ORM
objects are just bags of data, why can't they be dicts?  Our database
already encodes much of the information we care about-- the types of
columns, foreign-key references and such.  Maybe we can design an
interface such that DB access is done in a general way, and there are no
ORM classes.

# We initialize the Context with the particular database it will
# retrieve data from
context = Context(LAUNCHPAD_DB)
# 'Distribution' is a table name and distro_name is a column name.
distros = Context.search('Distribution', distro_name='ubuntu')
# require is a generic method to replace get_foo in your example.
# I'm not sure what the underlying representation of 'branding' in your
# example is.
branding = distros.require('branding')
# distros is a generic Result class, so hot_bugs cannot be a method on
# it.  I've made it a function, but it could be a static method or we
# could instantiate a domain object and invoke hot_bugs on it.
bugs = hot_bugs(distros, limit=20)
# We're relying on the foreign key constraints that our DB knows about #
here-- we know how a BugSubscription is related to a Bug through
# BugSubscription.bug, and we know that 'assignee' and 'reporter' refer
# to Person
people = bugs.require('BugSubscription.person', 'assignee', 'reporter')
branding.update(people.require('branding'))
context.realise()
distro = None
for bug in bugs:
    if bug['distro'] != distro:
        distro = bug['distro']:
            show_branding(distro)

Anyhow, that's some food for thought.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkzO18IACgkQ0F+nu1YWqI2YugCfSXEcXzHXhN1DusE8O0NXoJsa
WJAAnRgTizUuNoWxUTwam5sRQBpFwTWe
=T9rc
-----END PGP SIGNATURE-----

Follow ups

Re: Lower query counts via different code structure
From: Ian Booth, 2010-11-03
Re: Lower query counts via different code structure
From: Robert Collins, 2010-11-01

References

Lower query counts via different code structure
From: Robert Collins, 2010-11-01