← Back to team overview

launchpad-dev team mailing list archive

Re: Lower query counts via different code structure

 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/01/2010 11:49 AM, Robert Collins wrote:
>> I don't really follow this.  By attribute, you're referring to
>> attributes that are mapped to table columns?  Are only retrieving
>> selected columns to make queries less expensive?  Aren't such queries
>> already cheap?
> 
> There are a few costs. One major one is that if a query asks for
> columns outside the used index, the row has to be retrieved. That said
> its rare that we'll be able to exploit this - it means that we want
> data that is fully indexed. Less significant but still present
> overheads - columns that aren't retrieved don't have to cross the
> wire, be deserialised, be present in the SQL and thus parsed and
> planned, take up memory in the appserver.

It's hard to know whether these query cost savings will be worth the
finger-typing expense.

>>> Changes to how we do things:
>>>  - methods on 'domain objects' would never perform DB access.
>>
>> These are the 'Plain Old Python' objects?  How would we deal with
>> situations where we want to read and then write?

> We can either stay as we are for writes - which this
> proposal suggested as a temporary measure, or we can take a stab at
> what will fit best for us for writes now. I'm inclined to wait because
> we're so light on writes - I think we'll get much less benefit from
> improving the write side of our code for now, and we can come back
> when we have restructured. Coming back later will also give us a
> clearer picture of how the write story needs to fit into the read
> story, because the read story will be done.

Perhaps this is another reason that the Plain Old Python objects should
be dicts, so that we don't have a parallel hierarchy of "read" classes
and "write" classes in the interim.

>> Can you give an example where we'd want multiple Groups per db table?
> 
> Persons, People, Teams - we have both Team and Person in one db table.
> We use a [fugly] hack to differentiate them at the moment. Sometimes
> we know we're talking about Teams specifically, and it would be nice
> to not have the non-team methods present.

This is thinly-disguised inheritance, and I think inheritance is a good
reason to have different classes.  But domain-object thinking may help
here.  If you look at the build farm classes, we have three tables
representing a single binary build or recipe build, and this is another
representation of inheritance.  So maybe it's not "multiple groups per
db table" but "a group for each type of domain object", where each
domain object may include multiple tables.

>> Following his advice seems to mean stripping the ORM objects down until
>> they are just bags of data, and then having a parallel hierarchy of
>> domain objects that would apply our business logic on such bags of data.
>>  Presumably, that would include specifying how to look up attributes on
>> the ORM objects, and so the domain objects would wind up looking pretty
>> much the same as our current ORM objects.
> 
> This is one pattern for doing things; I don't particularly like it
> because it forces rearrangement when moving things from 'in appserver'
> to 'in-query' : something that I'm proposing to avoid.

Is that a general problem with separating business logic from database
logic?  Could an argument be made that some of our database constraints
are business logic?

>> I don't disagree with the argument that this would permit faster
>> testing, but instead, I believe we could provide an in-memory Store
>> implementation that would provide the same advantage without
>> restructuring our code.  I have no appetite for maintaining yet another
>> hierarchy of classes, especially if the ORM objects degenerate into bags
>> of data.
> 
> Me neither. I think the key thing to address is consistent behaviour:
> whatever layer we replace needs to be replaced with a consistently
> behaving test double. E.g. DB constraints need to be honoured and so
> forth. I can imagine a test db that is pure python and data driven
> from our schema - it would be a lot of work but pretty effective.
> Alternatively phrased; the test layer and real layer must be mutually
> liskov substitutable.

It's really a shame that postgresql syntax isn't compatible with
something easy to use in memory like sqlite.  I suppose a critique based
on liskov substitution applies equally to Ian's proposal, though.

> However one of the constraints we have is that our url routing and
> publishing logic are heavily dependent on type inspection (via
> Interfaces).

In this proposal, I imagined that our domain objects would be retained,
just that they would get their data from the dicts.  So presumably we
could still use the same url routing.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkzO6UsACgkQ0F+nu1YWqI3tpACfQWaSjj+FYJq4/JBJLi3R53OV
oM8An3OJMaHg/agvTiaoElDrB+bhmG1c
=Bg0c
-----END PGP SIGNATURE-----



References