← Back to team overview

launchpad-dev team mailing list archive

Re: rollout changes to reduce downtime - linked to 'release features when they are ready'

 

On Wed, Jul 21, 2010 at 12:45 PM, Julian Edwards
<julian.edwards@xxxxxxxxxxxxx> wrote:
> On Tuesday 20 July 2010 23:59:47 Michael Hudson wrote:
>> On 21/07/10 01:53, Julian Edwards wrote:
>> > On Tuesday 20 July 2010 06:25:27 Michael Hudson wrote:
>> >> Also, cherry picks (and sometimes post release re-rolls) tend to only be
>> >> to a limited number of machines.  I guess with the new workflow cherry
>> >> picks will be a thing of the past -- this makes me happy :-)
>> >
>> > So, are we going to roll out to *all* machines and restart them for, say,
>> > an urgent fix on cocoplum only?
>>
>> I think the idea was that it should be routine and easy to deploy new
>> code to every machine several times a day if needed.
>
> My concern is needless restarting of services, but if we make it so that's not
> really noticeable then there's no issue.

Exactly. Perceived downtime is all about noticable lack of services :
making it possible to do a full upgrade without downtime will have a
bunch of useful knock on effects:
 - we can run one codebase, using no-downtime staggered deployments to upgrade

one code base gives us:
 - more consistent experience (compare prod and edge timeout graphs -
prod has a higher timeout *and yet more timeouts*
 - no need for CP process; we just QA and deploy as normal

Getting rid of the CP process gets rid of waste (in the LEAN sense)
and *is* 'release features when they are done'.

We're currently blocked on this on the pg 8.4 / lucid upgrade I
believe - but I'm going to touch base with the losas on that today.

-Rob



References