Launchpad logo and name.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index ][Thread Index ]

reducing Launchpad downtimes?



Launchpad was down for a bit under three hours today.  As I recall it
was similar last month.  I realize most of you were asleep, but it was
the middle of the work day in Australia and other places.  (So I was
forced to go and ride my motorcycle, how sad ;-)  A few irc users
commented on it.

I'm told the downtime really is the downtime it takes to do the
database changes, so there's no easy answer.  But as we want to be a
really great and very reliable collaboration platform, and to still do
updates at frequent intervals, I think this is something to think very
hard about for later cycles.

I believe the heavy lifting in this upgrade was to improve
translations performance, which I'm sure will be pleasing to many
users.  But it's a bit stiff that this stops people using code
hosting, bugs, or PPAs.

Some (possibly naive) ideas:

 * split things so that you take down just the translations app while
its data is being migrated, leaving other apps running
 * add an abstraction layer so that db changes need not be strictly
synchronized with code changes
 * run in readonly mode against a copy of the database
 * perhaps this is crazy but why not let people just keep trying to
use it, and fail any particular request that can't succeed, with a
clear message?
 * at least, give more warning within the app itself (as we discussed
recently; I really think this should be an urgent priority.)

I presume there is some prior art here...

I guess you have talked about these before.  I realize it is not easy,
and there is lots of bugs and feature work to do.  However, for the
kind of promises we're making, or wanting to make, to our users,
Launchpad needs better than 99.5% uptime.

-- 
Martin




This is the launchpad-users mailing list archive — see also the general help for Launchpad.net mailing lists.

(Formatted by MHonArc.)