launchpad-dev team mailing list archive

Thread
Date
Re: CodeBrowse: The Path Forward

To: Max Kanat-Alexander <mkanat@xxxxxxxxxxxx>
From: Robert Collins <robertc@xxxxxxxxxxxxxxxxx>
Date: Wed, 26 Jan 2011 15:46:59 +1300
Cc: launchpad-dev@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4D3F7CA9.9030104@bugzilla.org>
On Wed, Jan 26, 2011 at 2:45 PM, Max Kanat-Alexander
<mkanat@xxxxxxxxxxxx> wrote:
>        Hey folks! The next step for improving codebrowse is to get loggerhead
> trunk running on launchpad. This will require the following things:
>
>        1) Improve the test suite coverage for loggerhead so that we have fully
> automated tests that prove that it works, and don't have to worry about
> deploying new technologies into production.
>
>        Currently some of the existing tests fail. Also, I suspect that the
> test suite needs more coverage--particularly for all the various
> combinations of options that loggerhead can take. (For example, tests
> should be run with and without the on-disk cache.)

I will *always* worry about new technologies; and I'm sure that the
sysadmin teams will do so as well. Increasing test coverage is good,
but not enough to eliminate concern :).

>        2) Once this is done and we're sure that it's stable, I'd like to
> release that as loggerhead 1.19. The general bzr community would benefit
> from a faster, better loggerhead as well as Launchpad benefiting from
> it. The NEWS file needs to be updated before this happens, though--it
> wasn't quite kept up to date with the trunk changes as they were checked in.
>
>        3) Develop a solid plan and implementation for history_db to run on
> codebrowse. I don't know much about the IT architecture of codehosting
> and codebrowse, but I did talk to jam about this a bit. The *ideal*
> solution would be that codehosting updates history_db itself whenever
> there is a push to a branch, and that loggerhead uses the history_db
> that is in those branches. Perhaps the updating of history_db could be
> done asynchronously by codehosting after the branch is updated, in order
> to not slow down commits or pushes.
>
>        jam would be the person who would probably know the most about what
> needs to be done here--I didn't do any work on history_db.

Codehosting is as follows:
frontend apache (does ssl unwrapping when needed) which forwards from
bazaar.launchpad.net to an haproxy load balancing cluster
haproxy then forwards to a loggerhead instance on a dedicated server
(currently there are two instances sharing a single server : this is
dictated by load and we can add more very easily).
loggerhead accesses the branches its serving over http from the
codehosting service.

We currently share the loggerhead cache and cookies.hmac file between
the two instances that we have; how to share these things when we go
to multiple frontend servers is an open question.

>        4) Comment out and disable the /raw/ controller in loggerhead, just for
> the launchpad instance of it. Don't add an option to disable it, just
> comment out the code. (I'd like to avoid a tremendous proliferation of
> command-line options for loggerhead's serve-branches script, and all
> you'd have to do is comment out one line of code and a few lines of
> template.)

That means we're running a fork; I guarantee someone will forget in
the future and have it enabled without meaning too. Some other better
options:
 - mask it out in the apache front end
 - delete it from trunk too if its not ready
 - add an option (does not imply a command line option) controlling it
to the wsgi app.

>        5) Deploy loggerhead 1.19 with history_db.

What are the implications of this; how do we turn it on? Can we still
share caches, and how will it interact under high load?

>        6) Make the /raw/ controller safe for codebrowse by allowing codebrowse
> to use launchpad's time-limited token system.
>
>        loggerhead serves raw file content on a separate domain, to prevent XSS
> attacks. This means that for codebrowse's private branches, some system
> is needed to authorize access to raw private branch data. I assume that
> the launchpad team is already familiar with how to do this thanks to
> methods of accessing private attachments in librarian. (Basically, you
> generate a token, redirect to the actual content page with the token as
> a URL parameter, then check the token and delete it from the database
> before displaying the actual content. It's a one-use token that proves
> that you are authorized to access the content.)

So, we use a time limited token rather than single-use tokens, because
of large files, internet being unreliable etc, but thats basically it.
The thing is, we don't need *a* separate domain, we need a *wildcard*
domain with each security context getting its own domain. (Security
context could be branch, or user owning the branch, or what have you:
basically the smallest unit of 'this might attack some other content'
that could exist). (Note that eve if we used a single-use token
hostile pages can still attack within the same domain by accessing the
dom of the returned content to read it from within the browser).

>        The problem is that authentication data other than the token shouldn't
> go to raw files, because that authentication data could be used by XSS
> attackers.
>
>
>        I'm available for discussions and planning of how all of this work is
> going to happen, but the work itself will be done by the Launchpad team,
> as I understand it from my discussions with Martin.

We currently have no specific work planned with Loggerhead; we will be
addressing the massive number of exceptions its raising, but more
detailed stuff like this is going to need to wait for a timeslot for a
squad to pick it up - its larger than I'd be comfortable calling
bugfix work, has significantly different friction compared to doing
work Launchpad (not saying better or worse, just different), and is
unfamiliar to most of the Launchpad developers. I suspect we will be
looking at August or September.

Between now and then our maintenance squads will need to deal with
critical issues (such as the 16000 OOPS we get from loggerhead daily)
- and they will be looking to do so in the least risky manner. Given
that we have many months before we have a concerted timeslice for
codebrowse, and not wanting fallout when we do critical fixes, and
that you're not going to be moving Loggerhead forward yourself - I am
proposing that we consolidate the Loggerhead trunk and launchpad
branches. The alternative that I see is Launchpad devs having a
somewhat harder time working with Loggerhead, which I predict will
show itself as bitrot at the project level.

So this is my proposal - Francis is in favour of this, but it hasn't
been put out for discussion by the whole team yet, and it should be I
think:
 - Launchpad team become maintainers of Loggerhead (which they were
via the code team a few years back, but the resourcing kindof slipped
somewhere ;))
 - as a result we do bug triage and code reviews & landings for loggerhead
 - We move Loggerhead to be part of 'Launchpad-project'
 - Rather than leave trunk in an awkward state for 6-9 months we push
the current known-good Launchpad branch on top of trunk, and preserve
trunk as a new 'future' branch
 - All future trunk landings are done with a commitment to
stablise-within-days-or-revert (which the feature squad structure in
Launchpad is well suited for)

Then, the work for getting the history db and raw controllers deployed
becomes a small LEP (I will write it up) and put into the work queue :
given the performance benefits history-db offers I imagine little
trouble scheduling it. If non-Launchpad folk want those things rock
solid in trunk before the Launchpad team has time to do the work, we
can set a small set of constraints - the tests you mention for
instance - to be met to be sure that that code is ready.

-Rob
Follow ups

Re: CodeBrowse: The Path Forward
From: Max Kanat-Alexander, 2011-01-26
Re: CodeBrowse: The Path Forward
From: Aaron Bentley, 2011-01-26
References

CodeBrowse: The Path Forward
From: Max Kanat-Alexander, 2011-01-26