launchpad-dev team mailing list archive

Thread
Date
Performance Tuesday - progress!

To: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
From: Robert Collins <robert.collins@xxxxxxxxxxxxx>
Date: Tue, 31 Aug 2010 20:36:25 +1200
Sender: robertc@xxxxxxxxxxxxxxxxx
So I had a great performance tuesday; two patches up, and only a few
hours fighting storm :).

I started looking at bug 607935 - bugtask:+index timing out. e.g.
https://bugs.launchpad.net/ubuntu/+bug/1

Its possibly not the cause, but *at least* 1.2 seconds of the page is
spent doing over 300 queries for the logic needed to fill out the
'unsubscribe team x'  actions.

I suspect Bug.indexed_messages is also a problem - I filed a bug on
this yesterday.

In doing this patch I extended StormStatementRecorder for use with
HasQueryCount, as there are view tests that were close to the code, so
I wanted to get the nice failure output of HasQueryCount without
driving the whole template - in case other parts of the template scale
badly this let me focus in.

To address it, I just changed a python expression to a SQL one:
for team in self.user.teams_participated_in:
   if bug.isSubscribed(team) or bug.isSubscribedToDupes(team):
...

Those two lines were the cause of (for me) 330 queries :)

In doing it I added a new helper along the line of existing ones to
IBug, it would have been nice to use GenericCollections or something,
but the need wasn't quite as high as the effort. Michael Hudson kindly
reviewed this midday for me.

I then reviewed Tim's Cloud optimisation patch, which should nuke the
timeouts on the code cloud \o/.

I then started poking at bug 618367 : Distribution:+assignments
timeouts, which is up near the top of the candidate-timeouts report.

This is e.g. https://launchpad.net/ubuntu/+assignments.

A major problem, according to an oops, is ValidPersonCache : 6 seconds
to do 170 lookups there.

This was pretty shallow, except for me making a beginners mistake in storm.
 - refactor the preloading code for Person._all_members to let the
valid person checks be reused.
 - add a test that the page query count doesn't scale up with as more
unique assignees etc are ysed

The storm issues I had today:
Class.some_reference == Class.some_reference
blows up in an inexplicible way. It would be awesome for this to work
better. I filed a bug.

LeftJoin(aliased_table, otheraliased_table.id==aliased_table.reference)
compiles to
LeftJoin .. on otheraliased_table.id = OriginalTable.foreignkey

(this is a variation on the first issue). What is needed is:

LeftJoin(aliased_table,
otheraliased_table.id==aliased_table.referenceID) (for SQLBase)
or
LeftJoin(aliased_table,
otheraliased_table.id==aliased_table.reference_id) (for native storm)


There was another stormish issue I ran into. In precaching this stuff,
it was an extension of the pattern I experimented with on
Person:+participants

In this case, we had three separate Persons to preload per Specification.

Now, there are two ways you can do this in SQL (without views - we're
removing a terrible-performance view):
 - add the three (aliased) tables once per person relationship. So we
have 10 tables being returned in the query.
 - join with an IN clause, where any of the assignee/drafter/approvers
can join to Person, and use the EmailAddress and Account tables only
once.

The latter way is a lot easier to read, and probably simpler for the
query planner to tune, but our mapping logic really doesn't fit well
with it:
- [0] is poorly defined (you need N rows - up to 3, per index in the result).
- [N:M] is even more poorly defined.
- Iteration needs to be cleverer - you need to buffer the
Specification until all possible rows are seen, then yield it.

I filed a bug on storm saying it would be nice to have built in
support to handle this more gracefully - storm *will* need it as it
grows eager loading facilities.

A big thanks to James Henstridge and Stuart for giving me a hand when
I was getting ready to headbutt storm for miscompiling things.

In other news, the top OOPSer for the last month or so is now fixed on
edge - e.g. https://api.edge.launchpad.net/1.0/bugs/414746/attachments
- I don't think I'm going to worry about a CP for it, Leanne said the
QA scripts handle LP dying :( - but next rollout should see this drop
right off the radar - its down to about one second for me.

Now, I'm off to figure out why I got two seemingly unrelated test
failures from my first patch.

-Rob