Launchpad was down for a bit under three hours today. As I recall it was similar last month. I realize most of you were asleep, but it was the middle of the work day in Australia and other places. (So I was forced to go and ride my motorcycle, how sad ;-) A few irc users commented on it. I'm told the downtime really is the downtime it takes to do the database changes, so there's no easy answer. But as we want to be a really great and very reliable collaboration platform, and to still do updates at frequent intervals, I think this is something to think very hard about for later cycles. I believe the heavy lifting in this upgrade was to improve translations performance, which I'm sure will be pleasing to many users. But it's a bit stiff that this stops people using code hosting, bugs, or PPAs. Some (possibly naive) ideas: * split things so that you take down just the translations app while its data is being migrated, leaving other apps running * add an abstraction layer so that db changes need not be strictly synchronized with code changes * run in readonly mode against a copy of the database * perhaps this is crazy but why not let people just keep trying to use it, and fail any particular request that can't succeed, with a clear message? * at least, give more warning within the app itself (as we discussed recently; I really think this should be an urgent priority.) I presume there is some prior art here... I guess you have talked about these before. I realize it is not easy, and there is lots of bugs and feature work to do. However, for the kind of promises we're making, or wanting to make, to our users, Launchpad needs better than 99.5% uptime. -- Martin
This is the launchpad-users mailing list archive — see also the general help for Launchpad.net mailing lists.
(Formatted by MHonArc.)