launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #04605
Re: The future of downtime for rollouts?
On Tue, Sep 14, 2010 at 9:55 PM, Jonathan Lange <jml@xxxxxxxxxxxxx> wrote:
> Hello,
>
> I've noticed that negotiating the downtime for Launchpad rollouts is
> becoming increasingly tricky.
>
> So I can be clear when asked,
> * what's the downtime for rollout now?
> * are we doing anything to reduce it?
> * when are we expecting to have zero downtime for rollout?
>
> I'll put the answer on a wiki somewhere once the thread winds up.
I have a few thoughts here.
The current process, AIUI goes like this:
- the RM asks the LOSAs and stub the needed downtime.
- they estimate it via various arcane methods(*)
- that is then used for the announcement.
Short term:
Perhaps it would be better to say:
'we have a 90 minute downtime window each release. Always 90 minutes,
and never more than.'
Long term:
ReleaseFeaturesWhenTheyAreDone has as part of its incremental rollout
no-perceived downtime for *everything* except DB schema patches.
Schema patches we are crippled on due to the locking needs of slony.
Once RFWTAD -or- the first of my performance goals (5 seconds) is
reached, I'll have another kanban slot and will be putting database
agility in that slot.
U1 are researching Cassandra in depth - that team may be organising a
meeting to get across it with one of the core devs; if that goes ahead
I hope to attend (and I hope Stuart and perhaps Gary will too) - but
nothing is actually arranged yet - I got this in a brief chat with
Elliot this morning. And to be clear: I've got no particular solution
in mind, other than the criteria: same or better write & read scaling;
same or better high availability; something compelling on data
integrity; same or better performance.
(*): Joking.
-Rob
Follow ups
References