launchpad-dev team mailing list archive

Thread
Date
Re: Lower query counts via different code structure

To: launchpad-dev@xxxxxxxxxxxxxxxxxxx
From: Ian Booth <ian.m.booth@xxxxxxxxx>
Date: Wed, 03 Nov 2010 11:47:38 +1000
In-reply-to: <4CCED7C5.2080302@canonical.com>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Lightning/1.0b2 Thunderbird/3.1.6
I much prefer these sorts of discussions ne down at the pub with a beer
in hand than on a mailing list :-)

> 
>>  - We'd probably one one (or more) Group types per db table,
>> eventually.
> 
> Can you give an example where we'd want multiple Groups per db table?
> 
>> These Groups would be very similar to ResultSets except:
>>   - they are not exposing a SQL interface. They are /typed/.
>>   - they are specialised for the things being returned
>>   - where multiple types may be returned, adaption would be a good way
>> to talk about one fraction of a group. E.g. search can return
>> people/projects etc - we could cast a search result to IPersons to
>> talk about things only relevant to Persons.
>>  - Groups become responsible for injecting attributes we need into the
>> individual represents-a-row objects
>

The above discussion to me indicates where the lines are blurred between
what IMHO should be separate concerns. At the logical level, we are
dealing with Teams, People, Branches, Products etc. Business logic which
uses these domain objects should not have to be concerned with what
table something maps to or be dealing with result sets etc. These are
separate concerns which should be internal to the ORM implementation. At
the logical level, we should be dealing with classes and object
relationships etc. What mapping strategy is used - table per class,
table per class hierarchy etc - should not really come into it at that
level.


>> This can also be looked at as a combination of HQL/query builder - but
>> its really much less capable than I'd expect an arbitrary HQL to be.
> 
>> So - what do you guys think? Does it have legs?
> 
> Definitely a direction I think we should pursue.  The impedance mismatch
> between efficient SQL and the potato programming that ORMs traditionally
> encourage is hurting us.
> 
> I've been listening to Ian Booth talking about the importance of
> separating business logic from the ORM objects, and it doesn't entirely
> make sense to me.  Our business logic is represented by our ORM objects,
> but the fact that they are database-backed is hardly ever important.
>

I'll try and explain my thoughts a little further. The 10 word summary
would be that it boils down to separation of concerns and the benefits
of loosely coupled, cohesive architectural layers. Perhaps think about
it this way. Software solutions require a domain model to operate on.
That domain model encapsulates the state of the system and models the
real world objects pertaining to the project. There are a number of
tensions or competing concerns at play: the domain model needs to be
persisted, most often in a relational database; the model class
breakdown and object relationships need to accurately reflect the real
world problem and provide APIs fit for purpose for use by the services
layer; the model objects should be testable and not tied to, or
dependent on, any container or other infrastructure. Sadly, there is
rarely a one size fits all approach to domain modelling where all the
competing concerns can be adequately addressed. Further, more often than
not, ORM solutions tend to provide somewhat leaky abstractions and
unwanted dependencies will creep in to the model objects. Such leakages
should really be contained where possible.

The fact that the ORM objects are database backed is important because
of the design/modelling comprises required to efficiently map the data
model. This may result in a data model great for efficiently storing in
a relational database but not so good for other things. The way I see
it, there's 3 distinct layers we can talk about in this context:

1. persistence/ORM
2. services
3. presentation/view

Some level of data transformation between layers is inevitable if one
wants to operate most efficiently in a given context. For example, the
presentation layer almost always works with data aggregated from various
pieces pulled together from parts of the domain model - a document
centric view. The services layer tends to work best with a fine grained
O-O model comprised of POPOs. The data model used by the persistence
layer often tends to be coarser grained, perhaps constructed with things
like caching in mind, and as I said earlier may contain ORM abstractions
which shouldn't be visible to other layers.

The above approach provides the necessary separation of concerns to help
minimise the tensions associated with satisfying different requirements
of each layer as well as allowing tuning or other implementation changes
to be made in one area without adversely affecting other parts of the
system or different use cases. The testability of various parts of the
system is also enhanced. Why should a database be necessary to test some
business logic in the services layer when all that is required is some
data model POPOs?


> Following his advice seems to mean stripping the ORM objects down until
> they are just bags of data, and then having a parallel hierarchy of
> domain objects that would apply our business logic on such bags of data.

That's one extreme but not really how it pans out in practice.

>  Presumably, that would include specifying how to look up attributes on
> the ORM objects, and so the domain objects would wind up looking pretty
> much the same as our current ORM objects.
> 

There would be similarities - it is after all the same underlying real
world model. But when you start using projections and other data
transformation mechanisms to extract and compose the data model relevant
to a particular use case, the different representations can and do
diverge.

> I don't disagree with the argument that this would permit faster
> testing, but instead, I believe we could provide an in-memory Store
> implementation that would provide the same advantage without
> restructuring our code.  I have no appetite for maintaining yet another
> hierarchy of classes, especially if the ORM objects degenerate into bags
> of data.

An in memory store is fine but doesn't solve the underlying issue of
undesired coupling between layers.

> 
> Perhaps we don't need that.  Maybe we can just work smarter.  If the ORM
> objects are just bags of data, why can't they be dicts?  Our database
> already encodes much of the information we care about-- the types of
> columns, foreign-key references and such.  Maybe we can design an
> interface such that DB access is done in a general way, and there are no
> ORM classes.
>

Some ORM solutions do adopt this approach, or support it along with a
more traditional class based one. It tends to work ok for storing object
data consisting of simple attributes but doesn't handle more complex
data models so well.


> # We initialize the Context with the particular database it will
> # retrieve data from
> context = Context(LAUNCHPAD_DB)
> # 'Distribution' is a table name and distro_name is a column name.
> distros = Context.search('Distribution', distro_name='ubuntu')

At the object level, IMHO it is a mistake to refer to tables and
columns. We should be talking about objects and attributes. It's up to
the ORM implementation to map these through to the correct database
constructs. Consider the following simple example:

We have a (simple) data model with the classes: Person, Team
Team->Person is one-many. Both Person and Team are stored in the same
table. Using an objects/attributes approach:

We can query for people or teams - pseudo code only:

people = Context.search('Person', surname='Smith')
teams = Context.search('Team', members.contains(Person(id=2)))

Now the DBA comes along and wants to map teams to their own table and
change the name of the column which stores surname. Using the
objects/attributes approach above, no business logic changes are needed.
However, for the case where we (incorrectly IMHO) refer to table names
etc in our business logic, the impact of the change would be quite large
across the code base. Just a simple contrived example to illustrate the
point.

More food for thought. This email is way too long already so I'll stop
there. I have have opinions on things like the use of named queries and
other mechanisms to abstract that sort of stuff out of the core business
logic but that can be for a separate email.

BTW, I'm not advocating going in an changing our existing implementation
with this stuff. My thoughts are more about the way forward - where
should we be aiming for and how should we do new work?
Follow ups

Re: Lower query counts via different code structure
From: Aaron Bentley, 2010-11-03
References

Lower query counts via different code structure
From: Robert Collins, 2010-11-01
Re: Lower query counts via different code structure
From: Aaron Bentley, 2010-11-01