launchpad-dev team mailing list archive

Thread
Date

Re: result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them

To: launchpad-dev@xxxxxxxxxxxxxxxxxxx
From: Jeroen Vermeulen <jtv@xxxxxxxxxxxxx>
Date: Mon, 30 Aug 2010 12:12:15 +0700
In-reply-to: <1283023963.9910.29.camel@babaroga>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6

On 2010-08-29 02:32, Danilo Šegan wrote:

У суб, 28. 08 2010. у 06:24 +1200, Robert Collins пише:

Well, is_empty does a *separate query*. Its not avoiding the work, its
doing some of it again.


That's true.  But you said "iterate".  is_empty doesn't iterate, it
executes a much faster (on average) "limit 1" query (it's roughly as
slow only when the result is "false", but those are much faster anyhow).

I ain't saying it couldn't be done even better, but is_empty is a worthy
improvement in itself.

Agree with Danilo here: is_empty isn't particularly costly (even if theunderlying Storm implementation could be faster--see bug 525825). InRobert's example scenario Storm should easily be able to optimize theis_empty away, but the common case is a very different one.

I think we should aim to fix those 300ms once we've gotten
everything else sorted out first.

Indeed. Those 300ms are also below my ping time to launchpad.net. Aslong as we have bigger fish to fry, I'd be happy to eliminate a querylike this if possible--but I wouldn't pick it as an SQL optimizationtarget if it runs just once per request on one page.

We don't current *see* that in our OOPS traces.
Its also tricky to get data for because its such a hot code path,
Gustavo has serious concerns that instrumentation in that area will
savagely hurt performance.


We can always cowboy stuff on staging and test directly.  A poor-man's
instrumentation. ;)

We ran one of our app servers under gdb for some time. It did causesome pain. In retrospect though I think the real problem was that werarely had clear indications of whether a particular performanceincident happened on the instrumented server or not. If the oopses etc.had stood out clearly it might have been fine.

Perhaps we should dedicate one of the production appservers toperformance experiments. We could use that for profiling, but alsoother controlled experiments such as trying different tradeoffs betweenprocesses and threads. For the fine-grained systemic optimizations wemay not even care much about timeouts, but more about enabling theexperimental server to handle more than its fair share of requests.

If we had something like this to give us authoritative answers toperformance questions, it wouldn't take us long to fill a few pages withworthwhile experiments. Just off the top of my head:

 * Does threading help throughput, or harm it because of the GIL?
 * How does threading affect consistency of our performance numbers?
 * Is BranchRevision really faster than talking directly to bzr?
 * What numbers does a particular timeout oops correlate with?
 * Will a particular prejoin improve things overall or make them worse?
 * Should we defer deserialization of multi-line strings in Storm?
 * Are we optimizing a rare pathology at the cost of the common case?
 * Can we win big from some simple caching in canonical_url?

We could get much better answers to these by comparing stub'sperformance graphs for two alternatives over the same time period thanwe could by making the change and trying to pick its effect out of thenoise next month!

Why is threading at the top of my list? Because some research suggeststhat it's possible to lose a few seconds (!!!!!) to bungled GILcontention sometimes. Our oops reports would probably show that asvariable delays scattered among SQL and non-SQL time. Arbitrarysub-millisecond queries could sometimes be reported as taking half asecond, without offering any clear optimization target. We do actuallysee those symptoms, but we have no clue where they really come from.



Jeroen

Follow ups

Re: result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them
From: Robert Collins, 2010-08-30

References

result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them
From: Robert Collins, 2010-08-27
Re: result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them
From: Danilo Šegan, 2010-08-27
Re: result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them
From: Robert Collins, 2010-08-27
Re: result sets and iteration... unsafe idioms - seeking thoughts on how we can avoid them
From: Danilo Šegan, 2010-08-28