launchpad-dev team mailing list archive

Thread
Date
Re: Lower query counts via different code structure

To: launchpad-dev@xxxxxxxxxxxxxxxxxxx
From: Aaron Bentley <aaron@xxxxxxxxxxxxx>
Date: Wed, 03 Nov 2010 10:57:01 -0400
In-reply-to: <4CD0BF3A.7030105@gmail.com>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/02/2010 09:47 PM, Ian Booth wrote:
>> I've been listening to Ian Booth talking about the importance of
>> separating business logic from the ORM objects, and it doesn't entirely
>> make sense to me.  Our business logic is represented by our ORM objects,
>> but the fact that they are database-backed is hardly ever important.
>>
> 
> I'll try and explain my thoughts a little further. The 10 word summary
> would be that it boils down to separation of concerns and the benefits
> of loosely coupled, cohesive architectural layers.

The degree of coupling between layers should be appropriate.  Neither
loose coupling nor tight coupling is, itself a virtue.  Tight coupling
reduces flexibility.  Loose coupling introduces flexibility, but it
comes at the cost of more abstration, and designing for flexibility
before you need it means you often get flexibility in the wrong places.

If we're talking architectural layers, I find the "shearing layers"
discussion in Big Ball of Mud quite illuminating:
http://www.laputan.org/mud/mud.html#ShearingLayers

I think the takeaway point, "factor your system so that artifacts that
change at similar rates are together," speaks directly to this issue,
because I believe our database representation must change basically in
tandem with the object model of our business logic.

> Perhaps think about
> it this way. Software solutions require a domain model to operate on.
> That domain model encapsulates the state of the system and models the
> real world objects pertaining to the project. There are a number of
> tensions or competing concerns at play: the domain model needs to be
> persisted, most often in a relational database; the model class
> breakdown and object relationships need to accurately reflect the real
> world problem and provide APIs fit for purpose for use by the services
> layer; the model objects should be testable and not tied to, or
> dependent on, any container or other infrastructure. Sadly, there is
> rarely a one size fits all approach to domain modelling where all the
> competing concerns can be adequately addressed. Further, more often than
> not, ORM solutions tend to provide somewhat leaky abstractions and
> unwanted dependencies will creep in to the model objects. Such leakages
> should really be contained where possible.

I don't think that such leakages can be contained at all.  I think that
efficiency demands that a significant portion of our business logic be
incorporated in SQL queries (or Storm expressions) and database constraints.

> The fact that the ORM objects are database backed is important because
> of the design/modelling comprises required to efficiently map the data
> model. This may result in a data model great for efficiently storing in
> a relational database but not so good for other things.

That's a very big "may".  Can you give some examples of classes whose
design is distorted because they are mapped to the database?

> The above approach provides the necessary separation of concerns to help
> minimise the tensions associated with satisfying different requirements
> of each layer as well as allowing tuning or other implementation changes
> to be made in one area without adversely affecting other parts of the
> system or different use cases. The testability of various parts of the
> system is also enhanced. Why should a database be necessary to test some
> business logic in the services layer when all that is required is some
> data model POPOs?

We must also determine that the transformations performed on our domain
objects do not result in database constraint violations.

>> Following his advice seems to mean stripping the ORM objects down until
>> they are just bags of data, and then having a parallel hierarchy of
>> domain objects that would apply our business logic on such bags of data.
> 
> That's one extreme but not really how it pans out in practice.

How does it really pan out in practice?

>>  Presumably, that would include specifying how to look up attributes on
>> the ORM objects, and so the domain objects would wind up looking pretty
>> much the same as our current ORM objects.
>>
> 
> There would be similarities - it is after all the same underlying real
> world model. But when you start using projections and other data
> transformation mechanisms to extract and compose the data model relevant
> to a particular use case, the different representations can and do
> diverge.

If we "compose the data model relevant to a particular use case" do we
get an explosion of classes, like providing Person as "CodeReviewer",
"BranchUploader", "BugReporter", etc?  What happens when we also care
about the branches uploaded by a CodeReviewer?

>> I don't disagree with the argument that this would permit faster
>> testing, but instead, I believe we could provide an in-memory Store
>> implementation that would provide the same advantage without
>> restructuring our code.  I have no appetite for maintaining yet another
>> hierarchy of classes, especially if the ORM objects degenerate into bags
>> of data.
> 
> An in memory store is fine but doesn't solve the underlying issue of
> undesired coupling between layers.

An in memory store would mean that "testability" would not be an
advantage of loose coupling.

>> # We initialize the Context with the particular database it will
>> # retrieve data from
>> context = Context(LAUNCHPAD_DB)
>> # 'Distribution' is a table name and distro_name is a column name.
>> distros = Context.search('Distribution', distro_name='ubuntu')
> 
> At the object level, IMHO it is a mistake to refer to tables and
> columns. We should be talking about objects and attributes.

IHMO, redundancy is our biggest problem.  If the table names are the
same as the proposed classes, having classes means introducing
redundancy.  If the column names are the same as the proposed
attributes, having attributes means introducing redundancy.

> It's up to
> the ORM implementation to map these through to the correct database
> constructs. Consider the following simple example:
> 
> We have a (simple) data model with the classes: Person, Team
> Team->Person is one-many. Both Person and Team are stored in the same
> table. Using an objects/attributes approach:
> 
> We can query for people or teams - pseudo code only:
> 
> people = Context.search('Person', surname='Smith')
> teams = Context.search('Team', members.contains(Person(id=2)))

This doesn't look like what I was proposing.  I think it would go
something like:

people = Context.search('Person', surname='Smith')
teams = Context.search('Person', id=2).require('TeamMembership.team')

> Now the DBA comes along and wants to map teams to their own table and
> change the name of the column which stores surname.

Did you mean to use 'Person' as the table name in that second example?
Because I can't see how the DBA could "map teams to their own table" if
they're already in the 'Team' table.

> Using the
> objects/attributes approach above, no business logic changes are needed.
> However, for the case where we (incorrectly IMHO) refer to table names
> etc in our business logic, the impact of the change would be quite large
> across the code base. Just a simple contrived example to illustrate the
> point.

No changes are needed *there*, but changes are required in the ORM
layer.  I'm not sure this stands up to cost/benefit analysis.  The
benefit of not having to change the calling code may be small relative
to the cost of implementing and maintaining a second definition of
Person & Team.

> BTW, I'm not advocating going in an changing our existing implementation
> with this stuff. My thoughts are more about the way forward - where
> should we be aiming for and how should we do new work?

Please expect to change our existing implementation.

If we decide that our current way is the wrong way, then we'll have a
lot of code that does it the wrong way.  It would be confusing to have a
codebase that sometimes did things the wrong way, and sometimes the
right way.  Code that does things the wrong way is technical debt and we
should plan to clean it up.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkzReD0ACgkQ0F+nu1YWqI1jBwCffvp3/S3i4Bv43WcFSrOlN9Nt
J48An0DZ3o+3OWD1yhIFSSWIprJvObPW
=sCYk
-----END PGP SIGNATURE-----
Follow ups

Re: Lower query counts via different code structure
From: Ian Booth, 2010-11-04
References

Lower query counts via different code structure
From: Robert Collins, 2010-11-01
Re: Lower query counts via different code structure
From: Aaron Bentley, 2010-11-01
Re: Lower query counts via different code structure
From: Ian Booth, 2010-11-03