launchpad-dev team mailing list archive

Thread
Date

Re: API issue moving branches

To: Robert Collins <robert.collins@xxxxxxxxxxxxx>
From: Leonard Richardson <leonard.richardson@xxxxxxxxxxxxx>
Date: Thu, 20 May 2010 10:13:13 -0400
Cc: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <AANLkTinypxdqMKjkotmFZkShngjaEcEVeJt96AiSRioV@mail.gmail.com>

> I don't intend to put you on the spot - that is, if you don't have an
> answer, its fine. But it would comfort me to know that some of the
> following are intended to be addressed:
> - the transaction vs non-transactional model:
> * for instance: in LPAPI's, how do you create a set of objects that
> temporarily violate DB constraints, which appserver code can do
> easily. As a concrete example, rename two objects A and B to be B and
> A, safely
> * or call methods on altered objects without saving them (saving them
> means you can't check you have achieved what you wanted to achieve)
> * LPAPI's cannot do repeated reads, so any appserver code that
> depends on that can't really be ported to LPAPI's.
> - speed

I feel like we're going around in circles with more detail on each
go-round. I'll re-state the problem and the plan as I understand it:

1. There are enormous conceptual and practical obstacles standing in the
way of a fully unified API that is also a web service we're happy using.
Huge obstacles!

2. Nonetheless, I would like to try to get through those obstacles with
a unified API, because the alternative is to design and implement two
parallel APIs.

3. In the cases where we can't get to a unified API, we have a fallback
plan: two parallel APIs. This fallback plan is *annoying* and requires a
lot of duplication of effort, and refactoring that doesn't give you the
satisfying feeling you ought to get when you refactor something. But
that's what it means to have two parallel APIs.

Look at it in terms of manpower. The "web service team" is me. One
person. Not only am I responsible for the lazr.restful and
lazr.restfulclient frameworks, I have de facto responsibility for the
concrete instantiation of those frameworks: the Launchpad web service
itself. The *only way I can survive* is to ensure that the complexity of
the Launchpad web service remains a very small fraction of the
complexity of Launchpad.

If we start writing parallel APIs, the complexity of the Launchpad web
service will start to approach the complexity of Launchpad. This will
not scale. This is why I am so obsessed with finding solutions that work
by reducing the complexity of the Launchpad web service and/or
increasing the complexity of lazr.restful.

> * It takes 13 seconds to log into launchpad, find out that there are
> some pending merges for bzr and quit hydrazine again.
> This may not seem like a long time, but really - it is, for
> something that should be a single HTTP request [I say that because in
> other web services API's *it is* a single HTTP request].

13 seconds is an INCREDIBLY LONG TIME. It is TOTALLY UNACCEPTABLE. Let's
figure out how to fix it.

The first step is to profile. I added timing code to lazr.restfulclient
and wrote come code that retrieves the pending merges for bzr, basically
this:

Launchpad(...).projects['bzr'].getMergeProposals()

The results:

Request for https://api.edge.launchpad.net/1.0/bzr took 0.82 seconds.
Request for
https://api.edge.launchpad.net/1.0/bzr?ws.op=getMergeProposals took 3.03
seconds.

OK, there are two HTTP requests, not one. Fortunately, the first request
is unnecessary: it happens because you can't get a reference to
launchpad.projects['bzr'] without making an HTTP request to /bzr.

Now, we've already solved this problem for collections--note that we
were able to get a reference to launchpad.projects without making an
HTTP request to /projects. Similarly, we can have launchpadlib return a
shim object that only requests /bzr when you try to access some data
specific to that project. If you just want to invoke a named operation
on the project, that request will never be made.

The downside of this solution is that if you write
launchpad.projects['nosuchproject'], you won't find out right away the
project doesn't exist. You'll only find out when you try to access
project.owner or invoke project.getMergeProposals(). This seems like an
acceptable price to pay for not making that extra request.

I filed bug 583318 to deal with this. We can implement this solution or
not, but I wouldn't say it has anything to do with the server-side
design of Launchpad. It's a problem with the way the client translates
the programmer's desires into HTTP requests. The getMergeProposals call
itself is pretty slow on the server-side, but optimizing it shouldn't
require changing the API.

I don't know what launchpadlib setup you used to get your "13 seconds"
number, but it was either a new setup or an old setup. Let's say it was
a new setup. You made two HTTP requests and it took 13 seconds (due to
latency, that's longer than it took me to make the same two requests).

With an old setup, you would have made four HTTP requests and with those
latency numbers it would have taken 25 seconds or even longer. I've
spent the past month or so improving performance and this is the result.
You may have already seen
https://dev.launchpad.net/Foundations/Webservice/Performance but it has
detailed information and measurements of the performance improvements
I've put into place. It also lists ideas I haven't tried or finished
implementing yet.

(Similarly, if the "13 seconds" number comes from an old setup, try the
new setup. You should be pleasantly surprised.)

My point is this: a lot of things that ought to only take one HTTP
request _do already_ only take one HTTP request. The problem is that
until recently, the client has been performing additional HTTP requests
for its own purposes or for no reason at all. I'm eliminating the "no
reason" requests and making sure the "own purposes" requests are
executed very rarely. None of this has any effect on the design of the
Launchpad web service in particular.

===

Here's a more complicated example showing how I would like to approach
these problems in general. I wasn't sure what you meant by "log into
launchpad, find out that there are some pending merges for bzr and quit
hydrazine again," so I checked out lp:hydrazine and poked at the
scripts. Here's what I found when I profiled the scan-merge-proposals
script:

A. The current code looks basically like this:

ok_people = []
for team_name in ['contributor-agreement-canonical', 'canonical']:
team = launchpad.people[team_name]
for person in team.members:
ok_people[person.name] = team_name

for mp in project.getMergeProposals():
if mp.registrant.name in ok_people:
...
else:
...

Here's the profile:

Request for https://api.edge.launchpad.net/1.0/bzr took 1.94 seconds.
Request for
https://api.edge.launchpad.net/1.0/~contributor-agreement-canonical took
0.38 seconds.
Request for
https://api.edge.launchpad.net/1.0/~contributor-agreement-canonical/members took 5.13 seconds.
Request for
https://api.edge.launchpad.net/1.0/~contributor-agreement-canonical/members?ws.start=50&ws.size=50 took 0.70 seconds.
Request for https://api.edge.launchpad.net/1.0/~canonical took 0.29
seconds.
Request for https://api.edge.launchpad.net/1.0/~canonical/members took
10.24 seconds.
Request for
https://api.edge.launchpad.net/1.0/~canonical/members?ws.start=50&ws.size=50 took 5.18 seconds.
Request for
https://api.edge.launchpad.net/1.0/~canonical/members?ws.start=100&ws.size=50 took 5.46 seconds.
Request for
https://api.edge.launchpad.net/1.0/~canonical/members?ws.start=150&ws.size=50 took 7.08 seconds.
Request for
https://api.edge.launchpad.net/1.0/~canonical/members?ws.start=200&ws.size=50 took 6.70 seconds.
Request for
https://api.edge.launchpad.net/1.0/~canonical/members?ws.start=250&ws.size=50 took 4.49 seconds.
Request for
https://api.edge.launchpad.net/1.0/~canonical/members?ws.start=300&ws.size=50 took 4.65 seconds.
Request for
https://api.edge.launchpad.net/1.0/~canonical/members?ws.start=350&ws.size=50 took 1.23 seconds.
Request for
https://api.edge.launchpad.net/1.0/bzr?ws.op=getMergeProposals took 4.88
seconds.
...

That's 58 seconds just to get to the point where we have all the
information we need to run our algorithm. 2.5 seconds was spent making
requests that won't be made if bug 583318 is fixed. 50 seconds of the
remainder was spent getting all the members of the 'canonical' and
'contributor-agreement-canonical' teams. And it's not a latency problem
(though splitting it up into multiple requests certainly doesn't
help)--it's just really slow to get all 400 members of a team.

Now, scan-merge-proposals is not the code you were talking about, and I
suspect it's not very important, but let's suppose that it's incredibly
important. So important that its bad performance is catastrophic. What
can we do?

B. Well, for every merge proposal, we're trying to see whether or not
its registrant is a member of 'canonical' or
'contributor-agreement-canonical'. We could write the code to check the
registrant's list of memberships rather than the membership of the
group:

mps = project.getMergeProposals()
for mp in mps:
for membership in mp.registrant.memberships_details:
if (membership.status in ['Approved', 'Admin']
and membership.team.name in ok_groups):
...
else:
...

Of course, that would cause 2 HTTP requests to be made for every merge
proposal. So that's no good on its own. But what if the information
about the registrant's memberships was delivered as part of the payload
from getMergeProposals()? What if you could get rid of those 2N HTTP
requests by writing this?

mps = project.getMergeProposals(
ws.expand="registrant/memberships_details/team")
for mp in mps:
for membership in mp.registrant.memberships_details:
if (membership.status in ['Approved', 'Admin']
and membership.team.name in ok_groups):
...
else:
...

Instead of making 2N+1 HTTP requests, you'd make one monster HTTP
request. This is an application of a performance optimization we've been
thinking about for a while.
https://dev.launchpad.net/Foundations/Webservice/Performance#Expand%
20links%20in%20representations

"Expand links in representations" wasn't designed for this situation. It
seems useful here because it's an incredibly powerful
performance-optimization tool in general. I'd like to be able to tell
people: "To a first approximation, you can write your script
idiomatically, without worrying about performance, and then make it more
efficient with judicious use of ws.expand."

And here's the thing: just like the optimizations I've already made,
"Expand links in representations" has nothing to do with the design of
the Launchpad web service in particular. If/when I implement this, I'll
implement it inside lazr.restful, and every part of the Launchpad web
service will gain this feature at the same time. Our web service will
still have a lot of space for refactoring, but the point of the
refactoring will be to make the object model better for everyone, not to
make the web service more efficient.

C. OK, let's suppose that "Expand links in representations" is a TOTAL
FAILURE. Either it's too hard to implement, or it causes more efficiency
problems than it solves, or people just refuse to use it.

Or, let's suppose that while "Expand links in representations" works in
this situation, there's some other analogous situation where it doesn't
work. Let's stipulate a situation so weird that it falls between the
cracks of every single general non-Launchpad-specific improvement we can
think of.

So, we have no choice. We have to redesign Launchpad itself to
accommodate this important web service use case. What does the redesign
look like? Maybe it looks like this:

for mp in project.getMergeProposals(
partition_by_registrant_belongs_to=ok_groups):
if mp.partition_1:
...
else:
...

What an awful hack! It's the worst possible hack I could think of, but
we tried every other option and nothing else worked. What's the extent
of this hack? It's one extra web-service-specific argument to one named
operation, and one extra data field in the return value.
Project.getMergeProposals() still exists, it still works pretty much the
same way internally and externally, but there's an extra feature that,
as it happens, is only used externally. If it's really that important,
we can live with this hack.

D. OK, let's suppose that the hack is also a TOTAL FAILURE. We have some
situation where there is simply NO WAY to reconcile the external data
model with the internal data model. In this case, we would write two
different data models, optimized for different use cases. We can
certainly do this--the existing system is designed to allow it. But it's
unpleasant and it doesn't scale in terms of manpower. In any given
situation (performance or otherwise) I would like to exhaust the other
possibilities:

* Refactor into a single API that's better for both internal and
external use.
* Come up with a general web service-specific solution that doesn't
depend on the design of Launchpad.
* Publish a single API of which Launchpad and the web service use
slightly different subsets.

Leonard

Attachment: signature.asc
Description: This is a digitally signed message part

Follow ups

Re: API issue moving branches
From: James Westby, 2010-05-20

References

API issue moving branches
From: Tim Penhey, 2010-05-07
Re: API issue moving branches
From: Robert Collins, 2010-05-08
Re: API issue moving branches
From: Leonard Richardson, 2010-05-10
Re: API issue moving branches
From: Julian Edwards, 2010-05-12
Re: API issue moving branches
From: Leonard Richardson, 2010-05-18
Re: API issue moving branches
From: Robert Collins, 2010-05-19
Re: API issue moving branches
From: Leonard Richardson, 2010-05-19
Re: API issue moving branches
From: Robert Collins, 2010-05-19