launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #04606
Re: The future of downtime for rollouts?
On Tue, 2010-09-14 at 22:09 +1200, Robert Collins wrote:
> On Tue, Sep 14, 2010 at 9:55 PM, Jonathan Lange <jml@xxxxxxxxxxxxx> wrote:
> > Hello,
> >
> > I've noticed that negotiating the downtime for Launchpad rollouts is
> > becoming increasingly tricky.
> >
> > So I can be clear when asked,
> > * what's the downtime for rollout now?
> > * are we doing anything to reduce it?
> > * when are we expecting to have zero downtime for rollout?
> >
> > I'll put the answer on a wiki somewhere once the thread winds up.
>
> I have a few thoughts here.
>
> The current process, AIUI goes like this:
> - the RM asks the LOSAs and stub the needed downtime.
> - they estimate it via various arcane methods(*)
> - that is then used for the announcement.
>
> Short term:
> Perhaps it would be better to say:
> 'we have a 90 minute downtime window each release. Always 90 minutes,
> and never more than.'
Might be more reliable but less accurate :) We estimate the downtime
based on how long the last update took on staging, and then multiplying
by a factor that seems to have accurately reflected the difference in
time between staging and production (with a little padding). We could
only commit to 90 mins if we refused to rollout any DB updates that took
longer than a certain period of time on staging.
Follow ups
References