← Back to team overview

launchpad-dev team mailing list archive

Re: result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them

 

On 2010-08-29 02:32, Danilo Šegan wrote:
У суб, 28. 08 2010. у 06:24 +1200, Robert Collins пише:

Well, is_empty does a *separate query*. Its not avoiding the work, its
doing some of it again.

That's true.  But you said "iterate".  is_empty doesn't iterate, it
executes a much faster (on average) "limit 1" query (it's roughly as
slow only when the result is "false", but those are much faster anyhow).

I ain't saying it couldn't be done even better, but is_empty is a worthy
improvement in itself.

Agree with Danilo here: is_empty isn't particularly costly (even if the underlying Storm implementation could be faster--see bug 525825). In Robert's example scenario Storm should easily be able to optimize the is_empty away, but the common case is a very different one.


I think we should aim to fix those 300ms once we've gotten
everything else sorted out first.

Indeed. Those 300ms are also below my ping time to launchpad.net. As long as we have bigger fish to fry, I'd be happy to eliminate a query like this if possible--but I wouldn't pick it as an SQL optimization target if it runs just once per request on one page.


We don't current *see* that in our OOPS traces.
Its also tricky to get data for because its such a hot code path,
Gustavo has serious concerns that instrumentation in that area will
savagely hurt performance.

We can always cowboy stuff on staging and test directly.  A poor-man's
instrumentation. ;)

We ran one of our app servers under gdb for some time. It did cause some pain. In retrospect though I think the real problem was that we rarely had clear indications of whether a particular performance incident happened on the instrumented server or not. If the oopses etc. had stood out clearly it might have been fine.

Perhaps we should dedicate one of the production appservers to performance experiments. We could use that for profiling, but also other controlled experiments such as trying different tradeoffs between processes and threads. For the fine-grained systemic optimizations we may not even care much about timeouts, but more about enabling the experimental server to handle more than its fair share of requests.

If we had something like this to give us authoritative answers to performance questions, it wouldn't take us long to fill a few pages with worthwhile experiments. Just off the top of my head:
 * Does threading help throughput, or harm it because of the GIL?
 * How does threading affect consistency of our performance numbers?
 * Is BranchRevision really faster than talking directly to bzr?
 * What numbers does a particular timeout oops correlate with?
 * Will a particular prejoin improve things overall or make them worse?
 * Should we defer deserialization of multi-line strings in Storm?
 * Are we optimizing a rare pathology at the cost of the common case?
 * Can we win big from some simple caching in canonical_url?

We could get much better answers to these by comparing stub's performance graphs for two alternatives over the same time period than we could by making the change and trying to pick its effect out of the noise next month!

Why is threading at the top of my list? Because some research suggests that it's possible to lose a few seconds (!!!!!) to bungled GIL contention sometimes. Our oops reports would probably show that as variable delays scattered among SQL and non-SQL time. Arbitrary sub-millisecond queries could sometimes be reported as taking half a second, without offering any clear optimization target. We do actually see those symptoms, but we have no clue where they really come from.


Jeroen



Follow ups

References