← Back to team overview

launchpad-dev team mailing list archive

rollout changes to reduce downtime - linked to 'release features when they are ready'

 

One of the ramifications of *all* of the proposed 'release features
when they are ready' workflows is more production rollouts. As such I
went over the proposed plan with James Troup looking for holes - we
can't increase downtime - we spend too much time down already :)

As a result I've filed a few RT's to get redundant instances (probably
on the same servers) of things like the xmlrpc server, codehosting
server etc, so we can use one instance live and upgrade the other in
parallel.

There are three key things that are not yet prepped for highly
available rollouts:
 - cronscripts (probably including the job system)
 - buildd master/slaves
 - importds

I've filed a bug for the cronscripts as a whole and for the buildd's -
I had the temerity to mark these as high since we're going to be
impacting the ability for us to increase our velocity safely until
those are fixed.

I don't know enough about the job system or the importd system to
sensibly talk about highly available upgrades there yet. I'd love it
if someone were to just file bugs / RT's as appropriate to get such a
process in place - but failing that, I hope to discuss them with
whomever knows most in the next day or two.

This effort ties into performance improvements as an enabler: the more
quickly we can deploy improvements, the faster we can react to timeout
issues, and thus the lower we can safely make the timeouts without
causing extended downtime for users. Its all about cycle time :)

-Rob



Follow ups