launchpad-dev team mailing list archive

Thread
Date
Re: Lower query counts via different code structure

To: launchpad-dev@xxxxxxxxxxxxxxxxxxx
From: Ian Booth <ian.m.booth@xxxxxxxxx>
Date: Thu, 04 Nov 2010 15:44:52 +1000
In-reply-to: <4CD1783D.4050405@canonical.com>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Lightning/1.0b2 Thunderbird/3.1.6
I sure could use that beer :-)

> 
> I think the takeaway point, "factor your system so that artifacts that
> change at similar rates are together," speaks directly to this issue,
> because I believe our database representation must change basically in
> tandem with the object model of our business logic.
> 

Yes, I agree that the database representation and domain model used by
the business logic are coupled. That doesn't equate though to the
business logic having to operate on the exact same representation of the
data model as is managed by the ORM layer. There are often subtly
different requirements in play. The domain model used by the business
logic is often finer grained than the persistent objects, or there may
be transformations of the persistent representation to map to different
"views" on top of a given data set.

>> Perhaps think about
>> it this way. Software solutions require a domain model to operate on.
>> That domain model encapsulates the state of the system and models the
>> real world objects pertaining to the project. There are a number of
>> tensions or competing concerns at play: the domain model needs to be
>> persisted, most often in a relational database; the model class
>> breakdown and object relationships need to accurately reflect the real
>> world problem and provide APIs fit for purpose for use by the services
>> layer; the model objects should be testable and not tied to, or
>> dependent on, any container or other infrastructure. Sadly, there is
>> rarely a one size fits all approach to domain modelling where all the
>> competing concerns can be adequately addressed. Further, more often than
>> not, ORM solutions tend to provide somewhat leaky abstractions and
>> unwanted dependencies will creep in to the model objects. Such leakages
>> should really be contained where possible.
> 
> I don't think that such leakages can be contained at all.  I think that
> efficiency demands that a significant portion of our business logic be
> incorporated in SQL queries (or Storm expressions) and database constraints.
>

They can be contained if they are kept and managed by the ORM layer and
what's presented to the business logic is a "clean" object model. SQL
queries are fine and good ORM implementations allow the mixing of SQL
queries and their higher level object query language.

>> The fact that the ORM objects are database backed is important because
>> of the design/modelling comprises required to efficiently map the data
>> model. This may result in a data model great for efficiently storing in
>> a relational database but not so good for other things.
> 
> That's a very big "may".  Can you give some examples of classes whose
> design is distorted because they are mapped to the database?
> 

Hmmm. Not easily for lp just at the moment. If I were to talk about the
domain model used in my last job, it would be easier. In general terms,
what can happen is that many ORM layers tend to work best with a coarse
grained object model, so the number of classes is kept to a minimum and
more and more attributes are shoved onto a given class instead of being
factored out to related classes. This isn't necessarily the model one
would want to expose to the business logic.

>> The above approach provides the necessary separation of concerns to help
>> minimise the tensions associated with satisfying different requirements
>> of each layer as well as allowing tuning or other implementation changes
>> to be made in one area without adversely affecting other parts of the
>> system or different use cases. The testability of various parts of the
>> system is also enhanced. Why should a database be necessary to test some
>> business logic in the services layer when all that is required is some
>> data model POPOs?
> 
> We must also determine that the transformations performed on our domain
> objects do not result in database constraint violations.
>

Yes. Database constraints are a good safety net. But the rules could or
should also be implemented in the object model.

>>> Following his advice seems to mean stripping the ORM objects down until
>>> they are just bags of data, and then having a parallel hierarchy of
>>> domain objects that would apply our business logic on such bags of data.
> 
>> That's one extreme but not really how it pans out in practice.
> 
> How does it really pan out in practice?
>

In general, there's an object model with is constructed to be optimally
mapped to a relational schema representation. This is not necessarily
exactly the same as the object model used by the business logic.

> 
>> There would be similarities - it is after all the same underlying real
>> world model. But when you start using projections and other data
>> transformation mechanisms to extract and compose the data model relevant
>> to a particular use case, the different representations can and do
>> diverge.
> 
> If we "compose the data model relevant to a particular use case" do we
> get an explosion of classes, like providing Person as "CodeReviewer",
> "BranchUploader", "BugReporter", etc?  What happens when we also care
> about the branches uploaded by a CodeReviewer?
>

No, that doesn't happen, at least on the systems I've worked with.


>>> I don't disagree with the argument that this would permit faster
>>> testing, but instead, I believe we could provide an in-memory Store
>>> implementation that would provide the same advantage without
>>> restructuring our code.  I have no appetite for maintaining yet another
>>> hierarchy of classes, especially if the ORM objects degenerate into bags
>>> of data.
> 
>> An in memory store is fine but doesn't solve the underlying issue of
>> undesired coupling between layers.
> 
> An in memory store would mean that "testability" would not be an
> advantage of loose coupling.
>

True. But there would be extra set up etc and "baggage" associated with
running business logic tests.

>>> # We initialize the Context with the particular database it will
>>> # retrieve data from
>>> context = Context(LAUNCHPAD_DB)
>>> # 'Distribution' is a table name and distro_name is a column name.
>>> distros = Context.search('Distribution', distro_name='ubuntu')
> 
>> At the object level, IMHO it is a mistake to refer to tables and
>> columns. We should be talking about objects and attributes.
> 
> IHMO, redundancy is our biggest problem.  If the table names are the
> same as the proposed classes, having classes means introducing
> redundancy.  If the column names are the same as the proposed
> attributes, having attributes means introducing redundancy.
> 

There may be slightly different rules in place for how DBAs require
table and column names to be formed, or certain databases like Oracle
place a limit of 32 characters on column names so there's not always a
guarantee that they can be the same. In practice, it's not that hard to
maintain the separation.

>> It's up to
>> the ORM implementation to map these through to the correct database
>> constructs. Consider the following simple example:
> 
>> We have a (simple) data model with the classes: Person, Team
>> Team->Person is one-many. Both Person and Team are stored in the same
>> table. Using an objects/attributes approach:
> 
>> We can query for people or teams - pseudo code only:
> 
>> people = Context.search('Person', surname='Smith')
>> teams = Context.search('Team', members.contains(Person(id=2)))
> 
> This doesn't look like what I was proposing.  I think it would go
> something like:
> 
> people = Context.search('Person', surname='Smith')
> teams = Context.search('Person', id=2).require('TeamMembership.team')
> 
>> Now the DBA comes along and wants to map teams to their own table and
>> change the name of the column which stores surname.
> 
> Did you mean to use 'Person' as the table name in that second example?
> Because I can't see how the DBA could "map teams to their own table" if
> they're already in the 'Team' table.
>

My somewhat simplistic example had people and teams initially in the
same table and then I was suggesting what would happen if teams needed
to be factored out to their own table.


>> Using the
>> objects/attributes approach above, no business logic changes are needed.
>> However, for the case where we (incorrectly IMHO) refer to table names
>> etc in our business logic, the impact of the change would be quite large
>> across the code base. Just a simple contrived example to illustrate the
>> point.
> 
> No changes are needed *there*, but changes are required in the ORM
> layer.  I'm not sure this stands up to cost/benefit analysis.  The
> benefit of not having to change the calling code may be small relative
> to the cost of implementing and maintaining a second definition of
> Person & Team.
> 

I systems I've worked on in the past, the stability of the domain model
when tweaking the OR mapping rules is a benefit that hasn't taken that
much cost to achieve. But there's no one size fits all approach I guess.

>> BTW, I'm not advocating going in an changing our existing implementation
>> with this stuff. My thoughts are more about the way forward - where
>> should we be aiming for and how should we do new work?
> 
> Please expect to change our existing implementation.
>
> If we decide that our current way is the wrong way, then we'll have a
> lot of code that does it the wrong way.  It would be confusing to have a
> codebase that sometimes did things the wrong way, and sometimes the
> right way.  Code that does things the wrong way is technical debt and we
> should plan to clean it up.
> 

Agreed. I guess I was proposing some thoughts around the "to-be", and
any migration strategy would be discussed in a separate context.

Thanks for the good discussion.
Follow ups

Re: Lower query counts via different code structure
From: Aaron Bentley, 2010-11-04
References

Lower query counts via different code structure
From: Robert Collins, 2010-11-01
Re: Lower query counts via different code structure
From: Aaron Bentley, 2010-11-01
Re: Lower query counts via different code structure
From: Ian Booth, 2010-11-03
Re: Lower query counts via different code structure
From: Aaron Bentley, 2010-11-03