← Back to team overview

launchpad-dev team mailing list archive

Performance Tuesday: enhancement; analysis; questions

 

Hi

I took the opportunity to look at one of the code team top 10's (both
for duration and statement count). I thought I'd share what I found - it
leads to a few questions which I'm hoping some of the smart people here
can help shed some light on. It's a long email so I'll try and keep it
brief. I'm happy to share more detail if required. The oops is:

OOPS-1737M2079 Person:+subscribedbranches - 12.86s duration, 79 statements

*Enhancement*

First up though, I made an enhancement to the sql logging statement
tracer to provide a capability to log parameter values. The branch is
being landed as I type this (if ec2 is behaving itself).

Currently, you can specify LP_DEBUG_SQL=1 to enable logging of the SQL
that is executed, but the log does not include the parameters eg
    select from foo where something = %s <-- note the param value is not
logged

Without the parameters being logged, it makes it harder to understand
what is happening and to do things like detect duplicate SQL or relate
the SQL back to the business logic. eg for the above example, there
could be 2 executions of that statement,
either:
    select from foo where something = 'bar'
    select from foo where something = 'fred'
or
    select from foo where something = 'bar'
    select from foo where something = 'bar'

Clearly, the latter case represents something that could be optimised as
part of performance tuning. But is it is difficult to see this without
the bind parameters being logged.

To use the new feature, just set LP_SQL_DEBUG_VALUES=1. If this variable
is true, and LP_SQL_DEBUG is also true, then the SQL bind variables
are logged. NB the new setting has no effect if LP_SQL_DEBUG is not
specified.

*Question:* does the "Repeated SQL Statements" section on the oops page
collate the statements with or without taking into account the
parameters being substituted? If without, should it?


*Analysis*

1. Duplicate sql in populating view items (which btw are count(*) which
may be expensive in postgres I've gathered from others' posts).

Here's one particular issue which consumes, for this particular oops,
20% of the sql time. The root cause leads to a design issue which I'll
pose as a question at the end. Very briefly, the
PersonSubscribedBranchesView causes the instantiation of 2 copies of a
PersonBranchesMenu object - one for the menu links under the page title
and one for the links on the right hand side of the page. Each of these
uses a property cache, but a new, different cache instance is used for
each PersonBranchesMenu object, even though the underlying domain object
whose properties are being cached represents the same real world entity.
Hence as each PersonBranchesMenu attribute is populated, there is one
cache miss per attribute per cache instance - it should be only one miss
per attribute IMHO.

The other issue with the way this page is populated is that the
construction of both PersonBranchesMenu instances results in every
single one of the defined "extra_attributes" being retrieved, even
though one of those instances is only used to render the "Add Branch"
link on the right of the page and none of the extra_attributes from that
instance are actually used.

So we're executing potentially expensive sql to get attributes which
have already been fetched in a different instance and which aren't in
fact required for the instance in question. For this page, one of the
PersonBranchesMenu instances comes from a reference directly in the page
template, the other comes from an included macro.

*Questions*

a. Should the property caching infrastructure be made smarter so that
for a given view rendering cycle, each domain object representation only
uses a single cache instance?

b. Can anything be done to configure the tales/zope infrastructure so
that entities involved in rendering the view be accessed as singletons
where appropriate?

2. Other duplicate sqls

According to the oops report, there's a large number of duplicate sqls
which don't seem to be used to directly populate any view items (but I
haven't done a complete analysis yet).

17 x select ... from Person
17 x select ... from ValidPersonCache
14 x select .... FROM LibraryFileAlias

These don't take up too much sql time but to me the high execution
counts (> 1/2 the total) suggest there's something somewhere that should
be cached. Besides, these will add additional I/O load to the database
server which all adds up if the system is under load. Perhaps there will
be a similar sert of sqls executed for other similar views and a
common/shared optimisation approach can be used? I don't yet know enough
about the origin of these to make a call. Anyone else have any ideas?

If you've managed to read this far without hitting delete, well done and
go get a life :-)




Follow ups