launchpad-dev team mailing list archive

Thread
Date
Re: Lower query counts via different code structure

To: launchpad-dev@xxxxxxxxxxxxxxxxxxx
From: Ian Booth <ian.m.booth@xxxxxxxxx>
Date: Thu, 18 Nov 2010 13:07:27 +1000
In-reply-to: <4CD2C4CD.2030103@canonical.com>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Lightning/1.0b2 Thunderbird/3.1.6
Hi Aaron

Sorry about the delay in replying. Somehow I missed following up on this
thread. Too much email :-(

>> Yes, I agree that the database representation and domain model used by
>> the business logic are coupled. That doesn't equate though to the
>> business logic having to operate on the exact same representation of the
>> data model as is managed by the ORM layer. There are often subtly
>> different requirements in play. The domain model used by the business
>> logic is often finer grained than the persistent objects, or there may
>> be transformations of the persistent representation to map to different
>> "views" on top of a given data set.
> 
> You've said this several times, but I can't think of any examples where
> it would apply to us, so it doesn't resonate with me.
>

In general 2 of the common types of issue are business object models
with lots of nesting and also flattening an object graph so that it is
stored in fewer tables, meaning less joins when querying. Perhaps a more
concrete example is when you need to use a projection to cherry pick
data from the relational model to populate the data model for a view or
other business operation. Say there's a page which presents recipe build
information - last successful build, source package name, archive,
recipe etc - the business logic operates on a domain object, call it
RecipeBuildRecord, which is populated by querying the relational model
and extracting the required attributes. There is no RecipeBuildRecord
table - the business logic object has no direct correspondence to a
single object from the persistent model. This is a read only case, but
one could also construct a story around a use case whereby the user
modifies the business domain model objects and persistent objects are
extracted from these for writing back to the db. I guess it's about
representing the data model objects to best suit the technology and
requirements of the layer in which they reside.

A different example - consider the Person object. The table and current
domain object for this entity have everything including the proverbial
kitchen sink thrown in. To me, in the business layer, there should be a
core Person object with just attributes like name etc. Then there should
be association objects like PersonKarma, PersonAuthorisation etc. Now,
flattening out these object so they are stored in the one relational
table is fine, but why should the business logic have to deal with a
bloated Person object when all that may be needed for a particular use
case is to display the person's name or subscribe someone to a bug, both
of which would just require a minimal Core Person object.

A lot of what I am saying would depend on the capabilities of the
underlying ORM implementation. I have lots of experience with Hibernate
but no so much with Storm.

> Perhaps person vs team is an example of this?  The "coarse-grained"
> persistent objects use the same table, but they ought to be treated as
> different (but related) types in business logic?
>

Yes, that's a relevant example I think. Perhaps extending it further to
talk about nesting, perhaps the relational model may include a join
table to record team membership, or perhaps the business model may have
attributes which represent transitive memberships. For a given
relational model, there's more than one way to construct the business
data model, especially with respect to object relationship navigation,
and perhaps a given application will use more than one way according to
the particular use case.


>>> I don't think that such leakages can be contained at all.  I think that
>>> efficiency demands that a significant portion of our business logic be
>>> incorporated in SQL queries (or Storm expressions) and database constraints.
>>>
> 
>> They can be contained if they are kept and managed by the ORM layer and
>> what's presented to the business logic is a "clean" object model. SQL
>> queries are fine and good ORM implementations allow the mixing of SQL
>> queries and their higher level object query language.
> 
> It still sounds to me like the queries, which are business logic, would
> be implemented at the ORM layer.  Am I understanding correctly?
> 

Yes. Semantically, the business services would advertise an set of
finder or query operations (eg find all the recipes which haven't been
built in a while) and these business operations would be mapped through
to an underlying query on the persistent data model. The objects
returned from these queries do not necessarily need to be the persistent
objects directly.

>>>> The fact that the ORM objects are database backed is important because
>>>> of the design/modelling comprises required to efficiently map the data
>>>> model. This may result in a data model great for efficiently storing in
>>>> a relational database but not so good for other things.
>>>
>>> That's a very big "may".  Can you give some examples of classes whose
>>> design is distorted because they are mapped to the database?
>>>
> 
>> Hmmm. Not easily for lp just at the moment.
> 
> That's the biggest problem I have with this proposal.  It sounds like it
> *might* be useful if you have certain kinds of problems, but it's not
> clear to me that we do have those problems.
>

Yeah, that's the problem I'm having given my lack of battle scars with
respect to lp development. Perhaps the discussion near the top of this
email helps a little? I guess another general statement is that on
systems with the lack of separation I'm talking about, objects tend to
become bloated since it's easier just to add a new table column to
record a new attribute. If there were no relational database involved,
and the business data model were done purely from an OO modelling
perspective, I bet it would come out a lot different :-)


> Many projects have tight coupling that hurts them, but I think that in
> general, we have the opposite problem: too-loose coupling.  We have all
> these interface definitions that are mostly duplicates of the model
> definitions, and the model definitions in turn are mostly duplicates of
> the SQL table definitions.
> 
>>> We must also determine that the transformations performed on our domain
>>> objects do not result in database constraint violations.
>>>
> 
>> Yes. Database constraints are a good safety net. But the rules could or
>> should also be implemented in the object model.
> 
> I don't think we should duplicate logic if we can help it.  Duplicating
> logic leads to disagreements between the implementations and increases
> the testing burden.
> 

Agreed to a point. In the past, I've used a model driven approach where
the constraints from the business data model are automatically pushed up
into the gui validation logic and down to the database schema generation
engine. That way, the key rules are specified once as part of the
modelling and picked up where required in other layers. But that implies
a greater degree of tooling/automation than we current use for launchpad.

>>>>> Following his advice seems to mean stripping the ORM objects down until
>>>>> they are just bags of data, and then having a parallel hierarchy of
>>>>> domain objects that would apply our business logic on such bags of data.
>>>
>>>> That's one extreme but not really how it pans out in practice.
>>>
>>> How does it really pan out in practice?
>>>
> 
>> In general, there's an object model with is constructed to be optimally
>> mapped to a relational schema representation. This is not necessarily
>> exactly the same as the object model used by the business logic.
> 
> Even if they are not identical hierarchies, it sounds like they would be
> substantially similar, at least in magnitude.
>

Similar but different :-)


>>>> There would be similarities - it is after all the same underlying real
>>>> world model. But when you start using projections and other data
>>>> transformation mechanisms to extract and compose the data model relevant
>>>> to a particular use case, the different representations can and do
>>>> diverge.
>>>
>>> If we "compose the data model relevant to a particular use case" do we
>>> get an explosion of classes, like providing Person as "CodeReviewer",
>>> "BranchUploader", "BugReporter", etc?  What happens when we also care
>>> about the branches uploaded by a CodeReviewer?
>>>
> 
>> No, that doesn't happen, at least on the systems I've worked with.
> 
> Okay, clearly I'm misunderstanding what you mean by "compose the data
> model relevant to a particular use case".  What do you mean?
>

See the recipe build record example at the start of this email as one
case perhaps. Or see also the stuff I wrote on modelling or Person. Just
CorePerson would be required for many things, since all we care about is
recording the relationship to a person, and at the schema level that
equates to writing out the person id as a foreign key.

> We are talking about having different representations of a given ORM
> class depending on the use case, right?
>

No necessarily different in all circumstances. But not constrained to
use the one representation either.

> 
>> There may be slightly different rules in place for how DBAs require
>> table and column names to be formed, or certain databases like Oracle
>> place a limit of 32 characters on column names so there's not always a
>> guarantee that they can be the same. In practice, it's not that hard to
>> maintain the separation.
> 
> I have not experienced any limitations on naming tables or columns.  I
> feel that any duplication that is not necessary wastes effort, both in
> implementation and maintenance.  "Not that hard", but harder than
> necessary is still too hard.
>

You have been lucky then :-) Or perhaps I have been unlucky in having
used Oracle for so long :-)

Thanks again for the interesting points raised and the chance to discuss.
References

Lower query counts via different code structure
From: Robert Collins, 2010-11-01
Re: Lower query counts via different code structure
From: Aaron Bentley, 2010-11-01
Re: Lower query counts via different code structure
From: Ian Booth, 2010-11-03
Re: Lower query counts via different code structure
From: Aaron Bentley, 2010-11-03
Re: Lower query counts via different code structure
From: Ian Booth, 2010-11-04
Re: Lower query counts via different code structure
From: Aaron Bentley, 2010-11-04