launchpad-dev team mailing list archive

Thread
Date

Re: result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them

To: Danilo Šegan <danilo@xxxxxxxxxxxxx>
From: Robert Collins <robert.collins@xxxxxxxxxxxxx>
Date: Sat, 28 Aug 2010 06:24:30 +1200
Cc: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <1282909082.1943.85.camel@bljubuntu>
Sender: robertc@xxxxxxxxxxxxxxxxx

On Fri, Aug 27, 2010 at 11:38 PM, Danilo Šegan <danilo@xxxxxxxxxxxxx> wrote:
> Hi Rob,

>> Even where we have constrained object counts, this is still repetitive
>> work (and I suspect there is a 'if view/THING in the tal too, so we'll
>> also completely iterate these things 4 times.
>
> You avoid it in the TAL if you return a ResultSet and check for
> view/THING/is_empty.  I assume DecoratedResultSet would do the same.

Well, is_empty does a *separate query*. Its not avoiding the work, its
doing some of it again.

 Better to have a construct which says:
for x in y:
 ...
elseifywasempty:
 ...

(Which, maddeningly, the python for : else construct *does not do*).

>> If we had a separate mapping layer, this would be working totally from
>> within python, so even while a little inefficient, it would be
>> tolerable. As we don't though, its probably causing timeouts on bugs
>> with lots of attachments. One thing that makes this tricky is that by
>> returning a resultset attachments allows itself to be filtered later
>> on - and so it can be tricky to change the definition.
>
> If something times out due to this query being repeated 4 times, I'd say
> you likely have bigger problems (like this query is too slow on its
> own).  Improving something by 4x is great, but this particular query
> should be <100ms, imho.  300ms extra overhead is bad, but shouldn't make
> a page timeout.

300ms is bad: its 50% more than I think we should be aiming for as the
*entire render time* for a page.
(If we have 99% of requests rendering in < 1 sec, our actual average
performance will be better still - and I think 200ms is about right as
a goal).

As for stats, that query itself is < 100ms, but the object reloading
is significantly more. We don't current *see* that in our OOPS traces.
Its also tricky to get data for because its such a hot code path,
Gustavo has serious concerns that instrumentation in that area will
savagely hurt performance.

> Anyway, I believe we don't have to worry about things like these (unless
> you've got timeouts which prove otherwise, of course :), as long as the
> number of queries is static and small.

https://bugs.edge.launchpad.net/malone/+bug/583553 - I haven't dug
into the detail yet, but it is precisely that: a bug timing out with
lots of attachments.

>> I'd love it if we can come up with some way, some convention that will
>> make it crystal clear when something is:
>>  - cheap
>>  - not cheap
>
> We kind of assumed that using a property meant that something was cheap,
> and using a method meant that it could be expensive.  That's what we
> tried to do in translations.

I think thats what in-python stuff absolutely should do. However
there's a problem - we don't prepopulate/eager load our object graphs,
so broadly speaking 'none' of our attributes which provide data
related to other stored objects meet that criteria.

I hesitate to suggest that we should mass-migrate in that direction,
if we are going to find some way to effectively eager-load. Or perhaps
we should migrate, and when eager loading is working we can migrate
back piecemeal.

-Rob

Follow ups

Re: result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them
From: Danilo Šegan, 2010-08-28

References

result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them
From: Robert Collins, 2010-08-27
Re: result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them
From: Danilo Šegan, 2010-08-27