← Back to team overview

launchpad-dev team mailing list archive

Re: Performance tuesday: faster development

 

On Tue, May 17, 2011 at 11:08 PM, Martin Pool <mbp@xxxxxxxxxxxxx> wrote:
> Those are awesome results.  I also really admire the way you keep on
> so consistently celebrating the progress and pointing the way forward.

Thanks! Details interleaved below...

> Some thoughts, not all linked up or fully cooked:
>
> * I'm happy to see this kind of change possibly moving forward.

\o/

> * The lp api is really good in some ways (mostly, coverage) but
> limited in others, and people speak of lazr.restful and co as being
> hard to change or to represent particular things.  So it seems like
> this project could either snag on that or be a powerful motivator to
> improve it.

Indeed. I think we have different constraints and requirements for
internal apis vs external. We don't need long term support (we control
the clients), we don't want totally different dialects when things
change (we want different calls to be isolated so we can be upgrading
individual points not everything in lock-step), we simply can't use
launchpadlib (as I describe in the wiki page), and we need to make
creating and destroying services as lightweight as possible, we want
concurrent requests, and requests to different services; we need our
own business logic (so want data objects not mapped objects). In the
Template/API service I suggest we could split the public API out to
the front end servers; what we use internally is then totally
unrelated (but it may still be RESTful).

>  * "low wtf factor on changes": some people with SOA seem to hit
> (even) worse WTF moments; I suppose this is covered by some of your
> list of benefits from the current design, but it may also need to be
> addressed as a specific risk

I've added a nod to this in the relevant section. I think something
like launchpadlib would exacerbate this, but having a clear
remote-call layer would help mitigate it.

>  * lp api clients are hard to test now.  (there are some bugs open.)
> The recommended thing is either: run up your own lp dev; or test
> against production.  (Previously, it was recommended to test against
> staging, but my unscientific impression is it is down or timing out
> often enough to make this difficult.)  You talk about requiring
> services to provide a test fake implementation.  I wonder if that is
> waste: extra code to write, and a chance for them to get out of sync.
> Perhaps it is simpler to just have the actual services be reliable,
> fast, and simple enough to run up that people can actually test
> against them; then only if that cannot be achieved make a fake
> implementation.  This can still make the test layering more clear, and
> it leaves the path open to doing a fake if you want to later.

Transactions are slow. GPG is slow. Consistency checks are slow.
Initialisation of data on disk is slow.

-> fakes from day one

This is something I wish we had done in bzr much earlier - it would
have informed our design for things like tree transform (which doesn't
run on a transport) by making us measure the cost of the abstraction
vs the cost of using the full implementation in all our other tests.
(Tree transform is nice - this isn't a slight, merely an example I
knew of offhand).

>  * So a specific step towards this could be using loggerhead's json
> api to display some branch content inline in the main application?  By
> talking to it from the client, or by having the webapp templates proxy
> it?

Right. If we want webapp clients to view it we either need xhr
permission glue, or to map loggerhead into the lp namespace using
apache's url routing facilities.
If we do it ourselves in the template layer, we don't need either of
those things but we will be dependent on its reponsiveness to meet our
page render times.

We may want to do both depending on the situation. I'd certainly start
by sending clients to it (e.g. map it into the main namespace via
apache).

>  * As you say, lplib is problematic.  External clients need a better
> solution too.  It does seem to me that xmlrpc is the way of the past
> though, and that the trend is towards things that are json and restful
> but a bit more reality-based than lplib's current approach.  It meshes
> with what other client programs will expect, and it lets us get better
> mileage and help out of http-level services, like caches.  (Not that
> we want to rely too much on caching at that level, but it's nice not
> to rule it out.)  Starting new work based on xmlrpc today would seem
> unfortunate.

External needs are fundamentally different to internal needs and we
should avoid building something we might need unless we really do need
it.
I don't want to get into technology choices at this stage of the
discussion, and I certainly don't have hidden choices waiting in the
wings (but I am thinking about it, obviously). In the datacentre we're
running with sub-ms latency; we don't want or need http intermediary
services. We should do better for external clients, but thats a
different project IMO and one we're not well equipped to deal with
just now. We're not well equipped because we have a lot of tech debt
we're burning through, a backlog of features we want to deliver and a
high cost of development. Having got a handle on the major performance
issues that were driving our users batty (so that now they are asking
about features more than speed) I think our next major single-minded
theme needs to be internal development efficiency, which this proposal
is all about (even though it seems to be just architecture :)).


>> However, if we treat the templating and api engine as the entry-service rather than as part of the core data access service, we can dramatically simplify the testing story: a clean contract between template rendering/public api and model manipulation/optimisation/refactoring.
>
> Can you explain that more?

Well, consider two black boxes and a contract between them.

One does template rendering, choosing what data to display, PATCH
translation and the like. It has no database access which means that
the performance implications of its data-access calls are simple. It
has a lot of redundancy: there are many places that show the same
template, and many views on the same data. The contract specifies just
enough (and no more) data access and action-taken methods to implement
the first black box. The second box implements just this narrow(er)
contract and does sql data access. It has no redundancy in its inputs
- there is only one call to get the stats for a bugtracker, so only
one test needed. Its backend calls to SQL may need to be checked for
call scaling, generated queries and so forth.

The dramatic simplification is that we're separating out the views of
the data - and we have many different views on the same data - with
the calculation of the data, for which there is generally just one
codepath. So the multiplication effect stops applying to the whole
stack and just applies within the view area.

>  * I've said separately I think the api needs to move a bit more
> towards thinking about what people are likely to want in a particular
> request, much as we think about what ought to be on a particular
> webapp page, rather than just exposing model objects.  (This is not to
> say every api ought to be hand crafted.)  A headline case would be
> shipping the bug tasks inline with the bug object; in some ways doing
> this specifically for cases where it makes sense is more valuable than
> allowing users to ask for them.

This is about the external API, so while its something I agree with, I
think its best to treat it as an orthogonal problem (so that we're not
taking on 3 or 4 massive undertakings at once).

>> One thing that would make this service easier to implement is to stop rendering templates in API calls (at all) - and instead generate those things client side if they are being served out in an API response.
>
> I wasn't aware Launchpad did that at the moment.  What kind of thing
> renders templates from API calls.

Pages that update their content.

>  * I recall you saying previously it would be really infeasible to do
> a stepwise migration from a lazr.restful based api to a new way of
> publishing apis.  I can't recall the reasons, but it might be worth
> discussing in there because it seems like a substantial issue in
> moving to this model: either there needs to be a way to add a
> next-generation api piecewise, or lazr.restful gets improved, or
> something else.

We're not changing our published apis as part of this, so this
challenge does not apply to us.

>  * log management and correlation seems to get more important if there
> are more independently moving parts.

Great point, I'm adding that nuance under the increased moving parts bit.

-Rob


Follow ups

References