← Back to team overview

launchpad-dev team mailing list archive

Re: Block the use of non-cached References on model objects

 

On 2011-05-08 04:10, Robert Collins wrote:

If we generate an OOPS it means that scenario wasn't tested. Thats
suboptimal at best.

Wouldn't that nudge us back towards integration-level unit testing though? I imagine before we could open a can of worms like this we'd need comprehensive run-time support for tracking and gathering object-graph requirements. Changes in low-level functions have to be able to hand their object-graph requirements up the call chain so that we don't need to hard-code them all over the higher layers of the chain.


In a year of close attention now, I've seen one case where eager
loading was a pessimisation - with commit() being slowed down by storm
pathology when 10's of K of objects are live - it has O(live) overhead
rather than O(changed). This is a bug with storm though: the eager
loading would still have made the script faster (and more consistent)
were it not for this defect.

But have you been eager-loading based on some notion of where it made sense to do so, or have you been doing it arbitrarily for all kinds of references? Any common sense you may have exercised would have introduced an optimistic selection bias(*).

For example, foo.distribution will trigger few queries right now because distributions are few, and hot in our caches. But we'd have to preempt demand-loading of any foo.distribution references that might possibly come into play, even though some number of them won't. Resolving unneeded references can add up, so we'd need to have some idea of the costs. And of course there's a similar question of unneeded database loads.

My vague and unsubstantiated concerns are that: (1) Eager-loading the colder portions of a reference graph may sometimes be a net loss in the grand scheme of things (considering cold-load speed, hot-load speed, oopses, engineering time etc.). And that (2) Brittleness in requiring explicit exceptions to break out of eager-loading may provide false justification for accepting those losses.

I'm sure wholesale eager loading is faster than no eager loading, but do we know how much bad we'd be accepting with the good? This is where policy/mechanism separation comes in. Wouldn't we end up with the hoop-jumping required for optimization, and the pain from the oopses we introduced, driving our priorities when they should be driven by our actual performance goals? AFAICS non-intrusive profiling ahead of optimization would show us the same data and support the same changes, but without these problems.


Jeroen

(*) This is why I generally advise against exercising common sense.


Follow ups

References