launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #07091
Performance tuesday: faster development
Nearly a year ago now when I started working on Launchpad (again :P)
we faced a huge performance problem. We're over half way there now:
our request backstop is set to 9 seconds (with 3 overrides). This is
down from 20 seconds. We have approximately the same number of
requests failing a day - well under a tenth of a percent across the
site.
This is pretty damn awesome!
We've found and corrected a huge number of inefficient pages which
simply did too many queries, and others which had mistakes in their
SQL queries. We've also improved a number of query schemas. Needless
to say doing all this work has involved changes in some of our
toolchain (such as allowing model level caching).
And a month or so back when writing up the changes to our
infrastructure that we did to address the infrastructure issues
driving some aspects of our poor performance, I noted that we've cross
a significant perceptual threshold: we're no longer primarily
perceived as slow.
This gives us the breathing room to look at the next major performance
issue: our development cycle. A few things feed into this:
- Its getting harder to fix performance bugs simply: accessing 60K
rows of cold data @ 2ms each is always going to be a 2 minute
operation. We need more sophisticated solutions to handle the scale of
some of our problems. Adding such solutions is tricky and often
requires multiple iterations, but we can only iterate once a month due
to downtime constraints.
- We have a code base where we routinely make changes with unexpected
side effects, which hampers development. Sometimes they escape and
become regressions (we added about a week of work in this way over the
last 5 months).
- Running enough tests to be confident that the whole test suite will
pass is really quite hard.
- Making reusable components is very tricky because of the tight
coupling between our domain model and object persistence
Many of these things have been discussed before. I have a proposal
which I would like your joint help critically assessing. It is by *no
means* a done deal nor finalised.
The proposal is the first of three documents I intend us to have on
this (large) topic:
* The analysis / overview / business case
* A vision stripping that analysis to its bare bones, establishes a
framework for answering questions like 'should X be a service' and
makes considered but opinionated choices about technology.
* A migration roadmap which identifies ordering, costs and benefits
from the various things that go into a multigeneration massive
migration.
In this proposal I have deliberately not made choices (such as 'rabbit
vs xmlrpc vs restful json vs ...) which do not affect the overall
discussion. I'm positive we'll have a fine old time deciding on
different implementation choices; we should decide on the overall
approach before making such choices though :) [what should we do, when
should we do it and how should we do it... in that order when possible
]
I've spoken to some of you already about this - thank you -very- much
for your feedback on the proposal so far. I owe you all! The list at
the top of the document is probably not complete - some of the ideas
have been around (literally) for years.
With no further ado:
https://dev.launchpad.net/ArchitectureGuide/ServicesAnalysis
Please read this and do one of:
- comment in it
- reply to this thread
- reply to me privately
depending your personal preferences.
If the proposal survives this feedback process then I'll start digging
into the juicy stuff - the other two documents I mention above.
-Rob
Follow ups