← Back to team overview

launchpad-dev team mailing list archive

Re: result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them

 

Hi Rob,

У пет, 27. 08 2010. у 15:01 +1200, Robert Collins пише:
> So, I've been poking at the bug attachment timeout, and as I started
> to learn the code I found this:
> 
> (Whomever wrote this, please don't feel picked-on, this is simply an
> *example* of a repeated bug I've seen in the last few weeks).
> 
> Anyhow, this code:
> ---------------
>     @property
>     def regular_attachments(self):
>         """The list of bug attachments that are not patches."""
>         return [
>             {
>                 'attachment': attachment,
>                 'file': ProxiedLibraryFileAlias(
>                     attachment.libraryfile, attachment),
>                 }
>             for attachment in self.context.attachments
>             if attachment.type != BugAttachmentType.PATCH]
> 
>     @property
>     def patches(self):
>         """The list of bug attachments that are patches."""
>         return [
>             {
>                 'attachment': attachment,
>                 'file': ProxiedLibraryFileAlias(
>                     attachment.libraryfile, attachment),
>                 }
>             for attachment in self.context.attachments
>             if attachment.type == BugAttachmentType.PATCH]
> ---------------------
> is unsafe (performance wise), and its not *obvious* that its unsafe.

Why do you say it's non-obvious? Because they are mutually exclusive
subsets of a bigger set?  These things will happen with many other cases
as well (i.e. you'll commonly filter on entirely different criteria and
then combine stuff), and my opinion is that it's obvious that it won't
perform best when split into separate methods, but code will be slightly
more maintainable.

> Its unsafe because its calling __iter__ on self.context.attachments,
> which is a ResultSet; so a page template that shows
> patches:
> ...
> attachments:
> ...
> 
> Will cause two complete iterations of the resultset - and that means
> two queries to the database, and two updates of all the objects found
> by the result set.

Well, that's what the Collections should help with.  You'd still be
doing two queries, but you could easily restrict to appropriate
attachment type and return a ResultSet directly.

To do only a single query, you'd have to move this to model and fetch
them all together then filter them out into their respective lists.

> Even where we have constrained object counts, this is still repetitive
> work (and I suspect there is a 'if view/THING in the tal too, so we'll
> also completely iterate these things 4 times.

You avoid it in the TAL if you return a ResultSet and check for
view/THING/is_empty.  I assume DecoratedResultSet would do the same.

(SQLObjectResultSet supported bool() directly; with Storm ResultSet you
either use .any() or .is_empty(): that may sometimes be the reason
behind such code being considered "ok" in the past)

> If we had a separate mapping layer, this would be working totally from
> within python, so even while a little inefficient, it would be
> tolerable. As we don't though, its probably causing timeouts on bugs
> with lots of attachments. One thing that makes this tricky is that by
> returning a resultset attachments allows itself to be filtered later
> on - and so it can be tricky to change the definition.

If something times out due to this query being repeated 4 times, I'd say
you likely have bigger problems (like this query is too slow on its
own).  Improving something by 4x is great, but this particular query
should be <100ms, imho.  300ms extra overhead is bad, but shouldn't make
a page timeout.

Anyway, I believe we don't have to worry about things like these (unless
you've got timeouts which prove otherwise, of course :), as long as the
number of queries is static and small.

> I'd love it if we can come up with some way, some convention that will
> make it crystal clear when something is:
>  - cheap
>  - not cheap

We kind of assumed that using a property meant that something was cheap,
and using a method meant that it could be expensive.  That's what we
tried to do in translations.

Cheers,
Danilo





Follow ups

References