On Mon, 2009-08-24 at 10:59 +0100, James Westby wrote: > Hi, > > I woke on Saturday to discover that the edge rollout had apparently > failed halfway through and left at least one machine on an old revision. > This plays havoc with the API, and meant that my scripts could not > work all weekend, with nothing I could do to fix it (with edge the > only option for running scripts currently). > > Apparently things got worse on Sunday, with the webapp playing up > as well. > > As there is no LOSA available at the weekend (which I have zero > problem with, they deserve their weekends), I'm not sure that > there should be automated rollouts on Saturday and Sunday mornings, > as if they go wrong it makes life hard for those that wish to > use LP at the weekend. > > As it is edge, redirection can be disabled (except for API uses), > however, if people feel like they have to disable edge redirection > every two hours every weekend, then they are likely to leave the > beta test team to avoid the problem altogether. > > There is a QA benefit to deploying the latest code every day, > and that is valuable, however, delaying some of that feedback for > a day or two by not rolling out at the weekend wouldn't be a big > loss as there are few LP developers around at the weekend anyway, > and would increase the productivity of a valuable resource, your > beta testers. I think it would be better for you to be using the non-edge APIs - this would ensure your scripts don't get clobbered. Even if we fix the edge rollouts to roll back any changes if one fails, there could still be potential state inconsistencies during the edge rollout itself. > Alternatively there may be work underway to make the automated > rollouts more robust, which may make the recent problems with > the edge servers being inconsistent go away, leaving just the > occaisional show stopper bug introduced by a code change, which > would make the problem less frequent. Our current rollout process stops if one host fails to rollout. Unfortunately because of https://bugs.launchpad.net/launchpad-foundations/+bug/307447 the rollout process is more fragile than we'd like. As I mentioned above, we can change things to roll back all changes if one host fails but you would still encounter inconsistencies during the automated rollout/rollback process itself. Let me know if you'd still like us to make that change, or if working with non-edge is an option. Thanks, Tom > > Thanks, > > James > > _______________________________________________ > Mailing list: https://launchpad.net/~launchpad-users > Post to : launchpad-users@xxxxxxxxxxxxxxxxxxx > Unsubscribe : https://launchpad.net/~launchpad-users > More help : https://help.launchpad.net/ListHelp
This is the launchpad-users mailing list archive — see also the general help for Launchpad.net mailing lists.
(Formatted by MHonArc.)