launchpad-dev team mailing list archive

Thread
Date

Re: Block the use of non-cached References on model objects

To: Robert Collins <robertc@xxxxxxxxxxxxxxxxx>
From: Jeroen Vermeulen <jtv@xxxxxxxxxxxxx>
Date: Mon, 09 May 2011 15:41:08 +0700
Cc: Launchpad Development <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <BANLkTimOotnxMj=bSMryzS-yK3OSnC-ySw@mail.gmail.com>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10

On 2011-05-08 04:10, Robert Collins wrote:

If we generate an OOPS it means that scenario wasn't tested. Thats
suboptimal at best.

Wouldn't that nudge us back towards integration-level unit testingthough? I imagine before we could open a can of worms like this we'dneed comprehensive run-time support for tracking and gatheringobject-graph requirements. Changes in low-level functions have to beable to hand their object-graph requirements up the call chain so thatwe don't need to hard-code them all over the higher layers of the chain.

In a year of close attention now, I've seen one case where eager
loading was a pessimisation - with commit() being slowed down by storm
pathology when 10's of K of objects are live - it has O(live) overhead
rather than O(changed). This is a bug with storm though: the eager
loading would still have made the script faster (and more consistent)
were it not for this defect.

But have you been eager-loading based on some notion of where it madesense to do so, or have you been doing it arbitrarily for all kinds ofreferences? Any common sense you may have exercised would haveintroduced an optimistic selection bias(*).

For example, foo.distribution will trigger few queries right now becausedistributions are few, and hot in our caches. But we'd have to preemptdemand-loading of any foo.distribution references that might possiblycome into play, even though some number of them won't. Resolvingunneeded references can add up, so we'd need to have some idea of thecosts. And of course there's a similar question of unneeded database loads.

My vague and unsubstantiated concerns are that: (1) Eager-loading thecolder portions of a reference graph may sometimes be a net loss in thegrand scheme of things (considering cold-load speed, hot-load speed,oopses, engineering time etc.). And that (2) Brittleness in requiringexplicit exceptions to break out of eager-loading may provide falsejustification for accepting those losses.

I'm sure wholesale eager loading is faster than no eager loading, but dowe know how much bad we'd be accepting with the good? This is wherepolicy/mechanism separation comes in. Wouldn't we end up with thehoop-jumping required for optimization, and the pain from the oopses weintroduced, driving our priorities when they should be driven by ouractual performance goals? AFAICS non-intrusive profiling ahead ofoptimization would show us the same data and support the same changes,but without these problems.



Jeroen

(*) This is why I generally advise against exercising common sense.

Follow ups

Re: Block the use of non-cached References on model objects
From: Robert Collins, 2011-05-09

References

Block the use of non-cached References on model objects
From: Gavin Panella, 2011-05-04
Re: Block the use of non-cached References on model objects
From: Jeroen Vermeulen, 2011-05-07
Re: Block the use of non-cached References on model objects
From: Robert Collins, 2011-05-07