launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #04509
Re: result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them
On Fri, Aug 27, 2010 at 11:38 PM, Danilo Šegan <danilo@xxxxxxxxxxxxx> wrote:
> Hi Rob,
>> Even where we have constrained object counts, this is still repetitive
>> work (and I suspect there is a 'if view/THING in the tal too, so we'll
>> also completely iterate these things 4 times.
>
> You avoid it in the TAL if you return a ResultSet and check for
> view/THING/is_empty. I assume DecoratedResultSet would do the same.
Well, is_empty does a *separate query*. Its not avoiding the work, its
doing some of it again.
Better to have a construct which says:
for x in y:
...
elseifywasempty:
...
(Which, maddeningly, the python for : else construct *does not do*).
>> If we had a separate mapping layer, this would be working totally from
>> within python, so even while a little inefficient, it would be
>> tolerable. As we don't though, its probably causing timeouts on bugs
>> with lots of attachments. One thing that makes this tricky is that by
>> returning a resultset attachments allows itself to be filtered later
>> on - and so it can be tricky to change the definition.
>
> If something times out due to this query being repeated 4 times, I'd say
> you likely have bigger problems (like this query is too slow on its
> own). Improving something by 4x is great, but this particular query
> should be <100ms, imho. 300ms extra overhead is bad, but shouldn't make
> a page timeout.
300ms is bad: its 50% more than I think we should be aiming for as the
*entire render time* for a page.
(If we have 99% of requests rendering in < 1 sec, our actual average
performance will be better still - and I think 200ms is about right as
a goal).
As for stats, that query itself is < 100ms, but the object reloading
is significantly more. We don't current *see* that in our OOPS traces.
Its also tricky to get data for because its such a hot code path,
Gustavo has serious concerns that instrumentation in that area will
savagely hurt performance.
> Anyway, I believe we don't have to worry about things like these (unless
> you've got timeouts which prove otherwise, of course :), as long as the
> number of queries is static and small.
https://bugs.edge.launchpad.net/malone/+bug/583553 - I haven't dug
into the detail yet, but it is precisely that: a bug timing out with
lots of attachments.
>> I'd love it if we can come up with some way, some convention that will
>> make it crystal clear when something is:
>> - cheap
>> - not cheap
>
> We kind of assumed that using a property meant that something was cheap,
> and using a method meant that it could be expensive. That's what we
> tried to do in translations.
I think thats what in-python stuff absolutely should do. However
there's a problem - we don't prepopulate/eager load our object graphs,
so broadly speaking 'none' of our attributes which provide data
related to other stored objects meet that criteria.
I hesitate to suggest that we should mass-migrate in that direction,
if we are going to find some way to effectively eager-load. Or perhaps
we should migrate, and when eager loading is working we can migrate
back piecemeal.
-Rob
Follow ups
References