launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #03793
Re: rollout changes to reduce downtime - linked to 'release features when they are ready'
On Tue, Jul 20, 2010 at 7:25 AM, Michael Hudson
<michael.hudson@xxxxxxxxxxxxx> wrote:
> On 20/07/10 16:39, Robert Collins wrote:
>>
>> Ok, so the answer may be 'we interrupt those jobs when we're ready?
>
> Yes, that's probably reasonable for the import case.
If thats fine, then the story for the importd's can be:
a) do all the rest of the upgrade
b) nuke em
c) deploy
d) start em
>>> An approach where you installed the new code at a new path and didn't
>>> delete
>>> the old code until all jobs running from that tree finished would work
>>> fine.
>>> I don't know how you tell all jobs running from a particular tree are
>>> finished though.
>>
>> Can we change the code to make that clear somehow?
>
> I can't think of anything tasteful right now. Do you have any ideas?
put the working dir in the commandline? then ps can tell us the dir it
started from?
> It occurs to me that the codehosting server has a slightly similar issue;
> you want to shut the old server down when its last connection closes. This
> is probably a bit easier though (the load balancer might be able tell you,
> or you can change the state of the ssh server through some control socket).
Yes, the ssh connection should be clear enough.
>> My understanding from James Troup is that the slaves go boom when the
>> tcp socket closes - I've filed a bug about this though.
>
> I find this a bit tricky to believe in general. The manager talks xml-rpc
> to the slaves, so there should be no persistent connection in general (even
> if we're using pipelining by some perverse miracle, it shouldn't matter if
> the socket closes). I can believe that losing the manager at an arbitrary
> time would be bad, but exiting between scans should be fine.
Sure, as I say, its hearsay. Oh, and its a feature.
>> Thanks for the feedback, its excellent to know a bit more about how
>> things are actively deployed. It sounds like there might be a code
>> change needed to make the importds easier to manage
>> transitions-of-code, perhaps you could file that?
>
> Let's have one more round of waffle first ;-)
/me waffles.
_Rob
References