launchpad-dev team mailing list archive

Thread
Date
Re: Lower query counts via different code structure

To: Ian Booth <ian.m.booth@xxxxxxxxx>
From: Robert Collins <robert.collins@xxxxxxxxxxxxx>
Date: Tue, 2 Nov 2010 05:13:52 +1300
Cc: launchpad-dev@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4CCEE101.7020004@gmail.com>
Sender: robertc@xxxxxxxxxxxxxxxxx
On Tue, Nov 2, 2010 at 4:47 AM, Ian Booth <ian.m.booth@xxxxxxxxx> wrote:
> Hi Robert
>
> Thanks for starting the discussion. I'll add in my 2c worth.

Thank you. I've snipped down to where we disagree for brevity :)

> I'm not sure I agree it is the sole root cause - we also seem to suffer
> from repeated queries and lack of appropriate caching in the right
> places. The former is often caused by plain dumb business logic and
> moving to a new architecture will not be a silver bullet since badly
> written code will do dumb things no matter how good the underlying
> infrastructure is. That's where we need to rely on good coding skills
> and reviews :-)

We certainly have repeated identical queries and I think that it is a
different issue, driven by the use of new views and newly queried
domain objects in reusable page fragements. Its true that that
wouldn't be helped by this proposal - but the efficiency of each
unrelated view would be improved - and we can separately figure out at
what layer we want to cache and build those things. SQL layer caching
is definitely wrong due to the richness of SQL making the cache hit
rate terribly low for anything other than id based lookups.


...
>> people = bugs.get_subscribers()
>> people.update(bugs.get_assignees())

> <snip>
>
> Here's where I have a slight problem, maybe. It depends on if I've
> interpreted the above code snippet properly. Does the people.update()
> call above write to the database?

s/write/read/ ? Its extending an in-memory group of Person objects,
just like 'dict.update(dict)'.

I wouldn't expect so. The point though is that if we decide it should,
it can read immediately - this may be useful for debugging.

> I don't think it should. I think all
> operations on domain model artefacts - single entities or collections -
> should be purely to update the internal state of those objects. The act
> of writing that data to the underlying data store is a separate concern
> and should be handled by the relevant lifecycle service for those
> objects. Such services will use the separately implemented ORM layer to
> do the job. There's a number of orthogonal concerns that should be
> managed and implemented separately. This allows the necessary control
> and optimisations to occur at the right places in the architecture to
> achieve the end performance goals and system robustness in terms of per
> use case tuning etc.

I agree, however...

We currently have 2 ORM styles in the code base; choosing a third
approach at the ORM layer will add a great deal of confusion in the
short term. I think we can do something really nice and high
performance on the current layers in the short term, and that will
naturally:
 - consolidate our ORM usage on one style
 - give us a great basis for evaluating the exact ORM design we want
 - which we can then consider how to get it.


>>  - the machinery needed to bootstrap should be minimal, so if we as a
>> team like the sound of this, we could sensibly say 'all new code
>> should be like this' and start a mass migration.
>
> My own view is that I think this needs to be planned a little more first
> to understand the full impact across the entire product, including
> external APIs, jobs, cron scripts etc. Let's understand the impacts a
> little more before we dive in and find out we have to refactor due to
> some unforeseen problem.

mmm. Given they are all built on the exact same idiom today, and what
we're discussing can deliver that, I really wouldn't expect an issue.
We can do a methodical review, but I don't think the domain of each
set of those things will influence what we need except for APIs - and
for APIs which suffer terribly from being activerecord like, this is a
potentially vast win - though we probably can't represent it in WADL
at all. What we definitely can do is leave APIs the way they are, and
work on them separately: once our appserver performance story is
sorted, I want to start improving our API story.


> +1. Business domain objects should be POPOs, and their methods should be
> about managing their internal runtime state.

I agree; however doing that for existing classes will be very hard
without a very solid write story. And it would be extremely easy to
end up writing an ORM on top of our ORM if we do that hastily. Also
for incremental deployment of this, I think we should have POPO as an
end goal, but not an initial-can-do.

>>  - we'd need a variant of cachedproperty which can answer from cache
>> but cannot populate it.
>
> We should also consider the concept of query caches, whereby whole
> results sets are cached, not just individual properties.

perhaps. I'm very much a don't-cache-unless-you-need-it. Theres a
whole load of complexity turns up as soon as we start doing that.

> And we should
> also at least discuss whether a cache's life cycle should be allowed to
> extend beyond the boundaries of a single web request.

I'm pretty uninterested in this for now. To date, every timeout I've
examined would happen even if we had such a cache. And we'd have a
/huge/ additional problem.
Some quick stats:
Appserver memory - 6G
DB memory - 120G


> <snip>
>
>>  - Groups become responsible for injecting attributes we need into the
>> individual represents-a-row objects
>>
>
> Personally, I'm having trouble with the term "Group" here. I think of it
> more as a Manager eg BranchManager, although maybe that's more for the
> lifecycle (CRUD) type stuff. XXXCollection comes to mind for read only
> query type stuff but we already have some of those already like
> BranchCollection. Perhaps XXXFinder?

Ah, one of the two remaining hard problems in CompSci. So, I'm happy
for us to brainstorm names; I won't critique any for a few days to let
the creative juices flow. The context is that we're moving /all/
methods that may depend on db data or deriving from db data from being
on an instance of the type to being on an instance of an object which
holds many single instances (and which is populated on
context.realise).

-Rob
References

Lower query counts via different code structure
From: Robert Collins, 2010-11-01
Re: Lower query counts via different code structure
From: Ian Booth, 2010-11-01