← Back to team overview

launchpad-dev team mailing list archive

Re: Block the use of non-cached References on model objects

 

On 2011-05-08 04:10, Robert Collins wrote:

If we generate an OOPS it means that scenario wasn't tested. Thats
suboptimal at best.
Wouldn't that nudge us back towards integration-level unit testing 
though?  I imagine before we could open a can of worms like this we'd 
need comprehensive run-time support for tracking and gathering 
object-graph requirements.  Changes in low-level functions have to be 
able to hand their object-graph requirements up the call chain so that 
we don't need to hard-code them all over the higher layers of the chain.

In a year of close attention now, I've seen one case where eager
loading was a pessimisation - with commit() being slowed down by storm
pathology when 10's of K of objects are live - it has O(live) overhead
rather than O(changed). This is a bug with storm though: the eager
loading would still have made the script faster (and more consistent)
were it not for this defect.
But have you been eager-loading based on some notion of where it made 
sense to do so, or have you been doing it arbitrarily for all kinds of 
references?  Any common sense you may have exercised would have 
introduced an optimistic selection bias(*).
For example, foo.distribution will trigger few queries right now because 
distributions are few, and hot in our caches.  But we'd have to preempt 
demand-loading of any foo.distribution references that might possibly 
come into play, even though some number of them won't.  Resolving 
unneeded references can add up, so we'd need to have some idea of the 
costs.  And of course there's a similar question of unneeded database loads.
My vague and unsubstantiated concerns are that: (1) Eager-loading the 
colder portions of a reference graph may sometimes be a net loss in the 
grand scheme of things (considering cold-load speed, hot-load speed, 
oopses, engineering time etc.).  And that (2) Brittleness in requiring 
explicit exceptions to break out of eager-loading may provide false 
justification for accepting those losses.
I'm sure wholesale eager loading is faster than no eager loading, but do 
we know how much bad we'd be accepting with the good?  This is where 
policy/mechanism separation comes in.  Wouldn't we end up with the 
hoop-jumping required for optimization, and the pain from the oopses we 
introduced, driving our priorities when they should be driven by our 
actual performance goals?  AFAICS non-intrusive profiling ahead of 
optimization would show us the same data and support the same changes, 
but without these problems.

Jeroen

(*) This is why I generally advise against exercising common sense.


Follow ups

References