← Back to team overview

launchpad-dev team mailing list archive

Webservice performance

 

Hi,

First off, let me say thanks for the webservice. Having it there has
enabled a lot of things, and it is very much a positive in my eyes.

However, I have a short story to tell you.

Ubuntu has used the work items tracker for a few releases now. This
scrapes the blueprints tracker, and pulls out workitems that are in the
whiteboards, as well as bug links etc. and graphs them over time.

  http://people.canonical.com/~platform/workitems/natty/all-ubuntu-11.04-beta.html

Due to the fact that blueprints weren't exported on the API, it has
always done this by screenscraping, using a set of regexes to pull out
what it wants from each page.

This has made it hard to do certain things (such as look at
dependencies, which are only represented in graphical form on the
blueprint page), so recently Guilherme and I exported an API for
blueprints.

I then ported the work items tracker to use this API, which was quick to
do. However, when testing it we noticed that it was about 10% slower
than the old code.

This suprised me, as I was expecting a performance increase, due to the
batching that the webservice gives you when dealing with collections,
and possible small increases from the appservers having to do less work
(rendering HTML etc) and not using regexes as much due to having
attributes already defined.

When looking in to why, I turned on tracing of the HTTP requests that
were being made, and it was quickly apparent what the problem was.

The code wants to associate blueprints with people, so for each of
(drafter, assignee, approver) it gets their user id.

  drafter = bp.drafter.name
  assignee = bp.assignee.name
  approver = bp.approver.name

What happens here is that there is an extra round trip on each of these
lines, to follow the link in the blueprint representation, get the
person representation, and to pull out their name.

These round trips quickly negate the benefit of the batching, as rather
than 1 round trip per blueprint that the old code does, we now do 3
(plus one for every page of blueprints). Even with caching, the code
still does the round trips to ensure that the cache is up to date (when
we don't really care in this case, we're not realtime, and changes while
the code is running can actually be harmful as it is tricky to code
defensively against that)

I was able to confirm this by writing a hacky function that given a link
to a person tells you there user id by parsing the URL, and the code
then showed the expected speedup over the old code.

While this isn't a huge issue, I'm disappointed that the API is slower
than screen scraping, and it isn't much of an advert for the API.

Please do something about this. Having a much better performing API
would open up the possiblilities of a whole new swathe of applications
using it, much like adding the API in the first place did.

Thanks,

James



Follow ups