← Back to team overview

launchpad-dev team mailing list archive

Re: Performance Tuesday: enhancement; analysis; questions

 

On Tue, 05 Oct 2010 22:52:31 +1000, Ian Booth <ian.booth@xxxxxxxxxxxxx> wrote:
> Hi
> 
> I took the opportunity to look at one of the code team top 10's (both
> for duration and statement count). I thought I'd share what I found - it
> leads to a few questions which I'm hoping some of the smart people here
> can help shed some light on. It's a long email so I'll try and keep it
> brief. I'm happy to share more detail if required. The oops is:
> 
> OOPS-1737M2079 Person:+subscribedbranches - 12.86s duration, 79 statements
> 
> *Enhancement*
> 
> First up though, I made an enhancement to the sql logging statement
> tracer to provide a capability to log parameter values. The branch is
> being landed as I type this (if ec2 is behaving itself).
> 
> Currently, you can specify LP_DEBUG_SQL=1 to enable logging of the SQL
> that is executed, but the log does not include the parameters eg
>     select from foo where something = %s <-- note the param value is not
> logged
> 
> Without the parameters being logged, it makes it harder to understand
> what is happening and to do things like detect duplicate SQL or relate
> the SQL back to the business logic. eg for the above example, there
> could be 2 executions of that statement,
> either:
>     select from foo where something = 'bar'
>     select from foo where something = 'fred'
> or
>     select from foo where something = 'bar'
>     select from foo where something = 'bar'
> 
> Clearly, the latter case represents something that could be optimised as
> part of performance tuning. But is it is difficult to see this without
> the bind parameters being logged.

Well, the first case can also often be optimized into a single query
too, just not quite so trivially.  We've had many problems in Launchpad
where listings of items have ended up with one (or several!) queries per
item and suffered for it.

> To use the new feature, just set LP_SQL_DEBUG_VALUES=1. If this variable
> is true, and LP_SQL_DEBUG is also true, then the SQL bind variables
> are logged. NB the new setting has no effect if LP_SQL_DEBUG is not
> specified.
> 
> *Question:* does the "Repeated SQL Statements" section on the oops page
> collate the statements with or without taking into account the
> parameters being substituted? 

without, I think.

> If without, should it?

Probably not, for the reason I gave above.  Being able to see both would
be neat, I guess.

> *Analysis*
> 
> 1. Duplicate sql in populating view items (which btw are count(*) which
> may be expensive in postgres I've gathered from others' posts).
> 
> Here's one particular issue which consumes, for this particular oops,
> 20% of the sql time. The root cause leads to a design issue which I'll
> pose as a question at the end. Very briefly, the
> PersonSubscribedBranchesView causes the instantiation of 2 copies of a
> PersonBranchesMenu object - one for the menu links under the page title
> and one for the links on the right hand side of the page. Each of these
> uses a property cache, but a new, different cache instance is used for
> each PersonBranchesMenu object, even though the underlying domain object
> whose properties are being cached represents the same real world entity.
> Hence as each PersonBranchesMenu attribute is populated, there is one
> cache miss per attribute per cache instance - it should be only one miss
> per attribute IMHO.
> 
> The other issue with the way this page is populated is that the
> construction of both PersonBranchesMenu instances results in every
> single one of the defined "extra_attributes" being retrieved, even
> though one of those instances is only used to render the "Add Branch"
> link on the right of the page and none of the extra_attributes from that
> instance are actually used.
> 
> So we're executing potentially expensive sql to get attributes which
> have already been fetched in a different instance and which aren't in
> fact required for the instance in question. For this page, one of the
> PersonBranchesMenu instances comes from a reference directly in the page
> template, the other comes from an included macro.
> 
> *Questions*
> 
> a. Should the property caching infrastructure be made smarter so that
> for a given view rendering cycle, each domain object representation only
> uses a single cache instance?
> 
> b. Can anything be done to configure the tales/zope infrastructure so
> that entities involved in rendering the view be accessed as singletons
> where appropriate?

I think Curtis sent some mails about the menu infrastructure and how it
sucks for the way it's used now a while ago... yes, here:

http://www.mail-archive.com/launchpad-dev@xxxxxxxxxxxxxxxxxxx/msg04271.html

I don't know what the current status is here.

> 2. Other duplicate sqls
> 
> According to the oops report, there's a large number of duplicate sqls
> which don't seem to be used to directly populate any view items (but I
> haven't done a complete analysis yet).
> 
> 17 x select ... from Person
> 17 x select ... from ValidPersonCache
> 14 x select .... FROM LibraryFileAlias
> 
> These don't take up too much sql time but to me the high execution
> counts (> 1/2 the total) suggest there's something somewhere that should
> be cached. Besides, these will add additional I/O load to the database
> server which all adds up if the system is under load. Perhaps there will
> be a similar sert of sqls executed for other similar views and a
> common/shared optimisation approach can be used? I don't yet know enough
> about the origin of these to make a call. Anyone else have any ideas?

There are a bunch of queries that are done by very generic machinery.
I don't know what they are for though, really.  ValidPersonCache is an
idea that needs to die, I think.

> If you've managed to read this far without hitting delete, well done and
> go get a life :-)

Heh heh.

Cheers,
mwh



Follow ups

References